If anyone is curious about how signals are used for garbage collection (specific...

hayley-patton · on Oct 17, 2023

SBCL now uses the "write-bitmap"/"card marking" scheme on some platforms, which is a tiny bit faster (1-2% mentioned on sbcl-devel). More interesting is that doing the touch-detection in software allows for finer grained precision (e.g. #+mark-region-gc uses 128 byte cards) than hardware (e.g. 4kiB pages on x86-64, 16kiB on M1) which can drastically affect scavenging time [0]. The precision is also really nice for non-moving generational schemes: if old and new objects exist on the same card, writes to new objects (which are more common too!) will cause old objects to needlessly be scanned by GC, which is called "card pollution" by Demers et al [1], so reducing the card size reduces the likelihood of that happening.

[0] https://tschatzl.github.io/2022/02/15/card-table-card-size.h...

[1] https://dl.acm.org/doi/pdf/10.1145/96709.96735

ravi-delia · on Oct 18, 2023

Oh neat, I thought it was still unoptimized compared to gengc (which afaik does still use write protection). I'll have to check it out, see if I have a machine it'll run faster on

hayley-patton · on Oct 19, 2023

gencgc uses software protection on some but not all architectures -- I recall x86-64 and MIPS but not ARM though. On x86-64 with SBCL 2.3.8 for example:

    * (disassemble #'(setf svref))
    ; disassembly for (SETF SVREF)
    ; Size: 39 bytes. Origin: #x5349B00D                          ; (SETF SVREF)
    ; 0D:       483B77F9         CMP RSI, [RDI-7]
    ; 11:       731D             JAE L0
    ; 13:       488D44B701       LEA RAX, [RDI+RSI*4+1]
                                 ↓ card marking here
    ; 18:       48C1E80A         SHR RAX, 10
    ; 1C:       25FFFF0F00       AND EAX, 1048575
    ; 21:       41C6040400       MOV BYTE PTR [R12+RAX], 0
                                 ↓ pointer store here
    ; 26:       488954B701       MOV [RDI+RSI*4+1], RDX
    ; 2B:       C9               LEAVE
    ; 2C:       F8               CLC
    ; 2D:       C3               RET
    ; 2E:       CC10             INT3 16                          ; Invalid argument count trap
    ; 30: L0:   CC24             INT3 36                          ; INVALID-VECTOR-INDEX-ERROR
    ; 32:       1C               BYTE #X1C                        ; RDI(d)
    ; 33:       19               BYTE #X19                        ; RSI(a)