A single-file C allocator with explicit heaps and tuning knobs

(github.com)

36 points | by enduku 2 days ago

4 comments

enduku 2 days ago
I wrote this because I wanted more explicit control over heaps when building different subsystems in C. Standard options like jemalloc and mimalloc are incredibly fast, but they act as black boxes. You can't easily cap a parser's memory at 256MB or wipe it all out in one go without writing a custom pool allocator.
Spaces takes a different approach. It uses 64KB-aligned slabs, and the metadata lookup is just a pointer mask (ptr & ~0xFFFF).
The trade-off is that every free() incurs an L1 cache miss to read the slab header, and there is a 64KB virtual memory floor per slab. But in exchange, you get zero-external-metadata regions, instant teardown of massive structures like ASTs, and performance that surprisingly keeps up with jemalloc on cross-thread workloads (I included the mimalloc-bench scripts in the repo).
It's Linux x86-64 only right now. I'm curious if systems folks think this chunk API is a pragmatic middle ground for memory management, or if the cache-miss penalty on free() makes the pointer-masking approach a dead end for general use.
[-]
- throwaway2027 2 hours ago
  When dealing with memory in C defaulting to malloc or some opaque structure behind it is unless you just want to allocate and forget it for some one off program that frees memory on proc exit seems bad to me now. For any kind of sophisticated system or module you almost always want to write your own variety of slab, arena, pool, bump whatever it may be allocator.
- bitbasher 2 hours ago
  There's a single commit in the whole repository. Was this AI generated?
  [-]
  - sebazzz 2 hours ago
    You still have things like git squash etc.
    [-]
    - bitbasher 2 hours ago
      That doesn't make any sense. There's 10,000+ lines of code. There shouldn't be a single commit "Initial commit". I'm fine with squashing some commits and creating a clean history, but this isn't a clean history it's obfuscated.
      [-]
      - tosti 1 hour ago
        I also do this. Lots of weird commit messages because fuck that, I'm busy. Commits that are just there to put some stuff aside, things like that. I don't owe it to anyone to show how messy my kitchen is.
        [-]
        Bootvis 1 hour ago
        On the other hand, others don’t have to adopt, use or like your stuff which would be the reasons to publish it.
        One big commit definitely doesn’t help with creating confidence in this project.
        bitbasher 1 hour ago
        > I don't owe it to anyone to show how messy my kitchen is.
        There was once a time when sharing code had a social obligation.
        This attitude you have isn't in the same spirit. GitHub (or any forge) was never meant to be a garbage dumping ground for whatever idea you cooked up at 3AM.
        [-]
        tosti 1 hour ago
        It requires self-discipline to stay organized. A vcs is just a tool. I'm never organized, my brain just works that way. Whatever the tool, I'll create a mess with it. So as long as the project structure and its code is all good I can't care about anything else.
        greenavocado 55 minutes ago
        Explain why you think making a single commit is related to any source code sharing obligation? You completely failed to establish why making a single commit is indicative of it being garbage. Your statements are a series of non-sequiturs so far and thus I can't take you seriously.
        [-]
        bitbasher 31 minutes ago
        > Explain why you think making a single commit is related to any source code sharing obligation?
        When you share code it's presumably for people to use. It is often useful to have commit history to establish a few things (trust in the author, see their thought process, debug issues, figure out how to use things, etc).
        > You completely failed to establish why making a single commit is indicative of it being garbage.
        A single commit doesn't mean it's garbage. It erodes trust in the author and the project. It makes it hard for me to use the code, which is presumably why you share code.
        My garbage code response was in regards to the growing trend to code (usually with ai) some idea, slap an initial commit on it and throw it on GitHub (like using a napkin and tossing it in the rubbish bin).
      - drob518 1 hour ago
        It may have been released with a new repo created, losing all the previously-private history.
        [-]
        bitbasher 53 minutes ago
        Yes and no.
        Have you looked at the code? It was clearly generated in one form or another (see the other comments).
        The author created a new GitHub account and this is their first repository. It looks to be generated from another code base as a sorta amalgamation (either through code generation, ai, or another means).
        We're supposed to implicitly trust this person (new GitHub account, first repository, no commit history, 10k+ lines of complicated code).
        Jia Tan worked way too hard, all they had to do was upload a few files and share on HN :)
        [-]
        throwaway27448 28 minutes ago
        > We're supposed to implicitly trust this person
        That would be rather foolish even with a fully viewable history.
        I don't understand why you're so worked up about this—nobody is forcing you to use the code.
ntoslinux 1 hour ago
What is the reason for the weird `{ code };` blocks everywhere and is the below code machine generated?
```c ((PageSize) (chunk->pageSize - ((PageSize) ((PageSize) ((PageSize) (sizeof(Page) + (sizeof(struct _Block))) + (PageSize) ((sizeof(double)) - 1u)) & ((PageSize) (~((PageSize) ((sizeof(double)) - 1u)))))) - ((PageSize) ((PageSize) ((PageSize) ((sizeof(FreeBlock) + sizeof(PageSize))) + (PageSize) (((((sizeof(double)) > (4)) ? (sizeof(double)) : (4))) - ```
[-]
- mzajc 16 minutes ago
  Worse yet, there's several places with empty code blocks, eg. [0] and [1]. Even without that, the formatting contains so much unnecessary whitespace, newlines, casts, etc; I'm not sure why, given the already massive source file. How do you even fit [2] on a screen?
  [0]: https://github.com/xtellect/spaces/blob/422dbba85b5a7e9a209a...
  [1]: https://github.com/xtellect/spaces/blob/422dbba85b5a7e9a209a...
  [2]: https://github.com/xtellect/spaces/blob/422dbba85b5a7e9a209a...
- bitbasher 1 hour ago
  There's a lot of code in the file that is questionable to say the least. There are unnecessary blocks ( { ... }; ) of code with unnecessary semicolons that don't serve any logical purpose.
  My hunch tells me it may be the result of macro-expansion in C (cc -E ...), etc. So it's likely there's a larger code base with multiple files and they expanded it into a one large C file (sometimes called an amalgamation build) and called it a day.
  By they, I mean the OP, a script or an AI (or all three).
  [-]
  - motbus3 57 minutes ago
    Exactly my thought... This look like a clean room implementation situation
HexDecOctBin 2 hours ago
The classic Doug Lee's memory allocator[1] has explicit heaps by the name of mspaces. OP, were you aware of that; and if yes, what does your solution do better or different than dlmalloc's mspaces?
[1] https://gee.cs.oswego.edu/pub/misc/?C=N;O=D