Phosphor-decay simulation isn't new - xscreensaver has had it for ages. What *is...

AnthonBerg · on June 26, 2020

Ah! Cool!! If they're generating an actual PAL signal I guess that explains why the picture in the emulated Macintosh Plus started to jump around like a bad VHS tape when I set the emulation speed too high for the host to keep up.

Edit: I confirm that the code is really pleasant to work on and understand! Found this project by pure accident this morning and was able to hack meaningfully on it right away. Kudos to the developer.

pjc50 · on June 26, 2020

Yes - it looks like the system advances a certain number of scanline elements per CPU clock, so if you run the CPU clock faster than the video pixel clock you start missing elements.

TomHa · on June 27, 2020

Hi, it's the author here. I'm new to YCombinator, so please forgive any etiquette transgressions.

I've been through a few implementations of this; originally each machine provided data in any format it liked plus the GLSL necessary to decode that into a time-domain composite signal. Then Apple deprecated OpenGL so I retrenched to supporting a few fixed pixel formats, which are in the InputDataType enum in ScanTarget.hpp. Based upon what until then I'd found useful, it's a mix of direct sampling and frequency space stuff.

Luminance in one or eight bits, and the four RGB options in different bit sizes are standard PCM, but Luminance8Phase8 and PhaseLinkedLuminance8 are both frequency space. The former is pretty straightforward, with the latter you supply four luminance samples per output sample, but the one that's active at any given moment is a function of phase. It sounds a bit contrived, but it means that the amount of data I supply doesn't need to be a colour-clock-related multiple of what it is for some machines.

Earlier implementations of the decode were a lot smarter than the current — I used an intermediate composite buffer which was the least integer multiple of the input data rate that gives at least four samples per colour clock. To that I applied a 15-point FIR lowpass filter to separate luminance from chrominance, and then I continued from there. I actually think this is the correct solution, and I want to return to it soon.

Unfortunately I'm at the extreme shallow end of the pool in terms of GPU power as I use a 1.1Ghz Core M with its integrated graphics to power a 4k display, so 15 samples/pixel proved to be somewhat problematic and I switched to doing four evenly-spaced samples per colour clock, irrespective of the input rate, in order just to average those and try to knock out exactly just the colour subcarrier. Or, I guess, that's like the average of two comb filters. At the time I thought it looked fine, and it's still a genuine approach to decoding composite video even if it's a complete digital fiction, and it ran faster, so I went with it.

With hindsight I didn't really do enough due diligence, I think partly because I spend so much more working with PAL than NTSC.

The most prominent machine for which that approach doesn't work is the NTSC Master System; that produces pixels in-phase with the colour clock, and each pixel occupies two-thirds of a colour clock. So they alias like heck, and because it's in-phase I don't even get temporal dithering to mask the problem. I haven't yet implemented an in-phase PAL machine, so the aliasing tends to be much less prominent.

Anyway, right now I'm getting towards the end of a Qt port, so that Linux users finally get the full UI experience if they want it; after wrapping that up and with an eye on Apple's announcements of this week I'm going to have to admit that I'm really at the end of the road of being able to treat OpenGL as a lingua franca and I'm going to get started on a Metal back-end for the Mac target. I think that I'll probably also switch back to the 15-point FIR filter for composite decoding while I do, for all targets. I have a long-stale branch for reintroducing that under OpenGL which I'll seek to revive.

Also there's a couple of bugs in the current implementation that I'm almost certain are race conditions, that could do with a reinvestigation. The OpenGL ScanTarget is supposed to be a lock-free queue that rejects new data when full, but I don't know whether I've messed up with a false memory order assumption, or made an error even more obvious than that, but hopefully it'll come out in the wash. Especially if I'm accidentally relying on x86 coherency guarantees.

So, yeah, summary version: lots of room for improvement, some improvements hopefully coming soon.