Yeah, I was going to mention that if the long-running/CPU-bound work was put in ...

gavinhoward · on Oct 16, 2023

Why do I deserve that for having a CPU-bound program?

And why is longjmp() the wrong thing to do?

boffinAudio · on Oct 16, 2023

Because signals are delivered on context switch, and you're not allowing any of that to happen. Its not a well-behaved process in that case .. not that you shouldn't be able to just burn as much CPU as you want, just that you would need to open up some space to communicate with the OS in the meantime, and having a signal-handler-thread or a cpu-working-thread is how to do it properly. (Do you also handle SIGHUP? SIGSTOP?)

longjmp() is just asking for hell 6 months later when you come back and have to work out why you did such a hokey thing in the first place.

EDIT: it should be noted that the "side-effect" of using longjmp is .. a context switch .. so you gave yourself some space to behave properly with the OS. Its just that the more future-proof way to do it is to have a thread for work (you have this already with main()) and a thread for signals (you'd just add this to service signals only and give main() a way to cleanly exit) ... Not too complex, and a tad bit easier to read 6 months later ..

gavinhoward · on Oct 16, 2023

Unless you include OS preemption in "context switch," you are wrong; signals can come at any time.

If you do include it, then yes, signals only happen on context switch. But in that case, my CPU-bound code is not preventing context switches; the OS preempts it. This is how all major OS's are designed.

If I didn't longjmp(), by the way, it could take an infinite amount of time between when the user sent SIGINT to when the program stopped by the very nature of what that utility does. By using longjmp(), a SIGINT becomes a true interrupt, which is what users want.

Having a working thread and a signal thread does not change that because the signal thread would need to interrupt the working thread in the same way.

boffinAudio · on Oct 16, 2023

They can come at any time but they are only delivered to your process by the kernel during a context switch. They're queued until then.

> it could take an infinite amount of time between when the user sent SIGINT to when the program stopped by the very nature of what that utility does

A well coordinated signal-handling-thread and a workload-thread won't manifest this issue - poorly managed threads however, will.

>By using longjmp(), a SIGINT becomes a true interrupt, which is what users want.

Hard disagree.

What you've done is turned signals into interrupts, which is .. hokey. And not how signals are intended to be used. Its quite possible to get the behaviour you expect - fast interruption and death of work-code - but you'd have to sort your issues with threads out, first.

EDIT: its decades-old proven technology: use a semaphore or a mutex to keep your threads in lockstep, and avoid this longjmp() malarkey... signals aren't hard, but maybe they're only just a little less harder than threads ..

epcoa · on Oct 16, 2023

> What you've done is turned signals into interrupts, which is .. hokey

No that is precisely what they are - user space interrupts. Thats how they work. That’s why one has to worry about async safety and signal safe calls.

I have no idea where you get off on this “allowing a context switch” nonsense. In most of the systems being discussed, if a signal is delivered and the thread is runnable it will be delivered immediately and asynchronously - there is no queuing going on in that case. If the thread is not runnable/scheduled that’s another story but this does not square with what you’re saying, because it sounds strongly that you’re saying that signals are delivered synchronously (with context switches) and they are most definitely not, generally.

Also longjmp is an entirely user space concept - the kernel on the most common systems being discussed has no idea of its operation.

oasisaimlessly · on Oct 16, 2023

> They can come at any time but they are only delivered to your process by the kernel during a context switch.

This is backwards and extremely misleading; a signal being queued can trigger an immediate context switch (effected via an inter-processor interrupt). See `kick_process()` in the Linux kernel [1].

> What you've done is turned signals into interrupts, which is .. hokey. And not how signals are intended to be used.

Signals are userspace interrupts. That's exactly what they are, and they're no more hokey than hardware interrupts (so, pretty hokey).

[1]: https://elixir.bootlin.com/linux/latest/source/kernel/sched/...

gavinhoward · on Oct 16, 2023

I used the system you are advocating in my utility at first.

The problem is that you need to constantly check for work. That is expensive on tight loops. Yes, I checked on every loop iteration unless I knew a small, but constant, amount of work had to be done.

That's the thing: if you have a tight loop, do you know how long it's going to take to run through everything? In addition, when a SIGINT comes, can you be sure that your state is correct?

Here's a test: download my utility and run these commands:

    $ git checkout 2.7.0
    $ ./configure.sh -gO3
    $ make
    $ echo "2^2^32" | bin/bc

Then send a SIGINT after a random amount of time.

I did that twice, and both times, I triggered asserts. Some state was wrong.

This was after I had carefully gone through the entire codebase to check signals everywhere.

Now do this:

    $ git checkout 6.7.0
    $ ./configure.sh -gO3 -Sbc.sigint.reset
    $ make
    $ echo "2^2^32" | bin/bc

And send a SIGINT after a random amount of time.

bc exits because its stdin is a pipe, but nothing goes wrong. If you do it like this:

    $ bin/bc
    >>> 2^2^32
    ^C
    interrupt (type "quit" to exit)
    
        0: (main)
    ready for more input
    >>>

And you can merrily continue.

Semaphores and mutexes have the same problem: I would need to keep checking them.

I understand why you think the way you do; I did too. But the real world is more complicated.

aa-jv · on Oct 16, 2023

In the 'real world' compare-and-swap operations (such as one would find in atomic types used for communication between worker and handler) are single-cycle operations, if not a hard CPU flag...

>I understand why you think the way you do; I did too. But the real world is more complicated.

Please consider the complications of the high-end audio world, where such techniques are well established. Not only must bat-out-of-hell threads have all the gumption they can muster, but they have to be able to be controlled - in as close to realtime as possible - by outside handler threads.

I think boffinaudio is trying to help you improve your code quality. Its not bad advice to re-think this.

gavinhoward · on Oct 16, 2023

I've done real-time too.

If I was in that context, yes, I would use outside handler threads, but that's because I could probably put the compare-and-swap in one place (or very few).

As of 2.7.0, my bc had 77 places where signals were checked, and I was probably missing a few.

Real-time is different from what I was doing, so it required different techniques.

helpfulContrib · on Oct 16, 2023

>I've done real-time too.

There is a case for signals in strictly real-time, strictly high-performance, and also strictly realtime+high-performance code.

You have decided to go strictly for high-performance, for your well-argued reasons, and you've abandoned a standard practice for your stated claims, but this isn't just about your code - its about how people can mis-use signals, and in your case you're mis-using signals by not using them.

It is the advice:

>"Yes be very afraid of signals."

.. which feels not entirely appropriate.

So I took a look at the bc code, and I too am terrified of your use of longjmp.

It appears to me you've gotten somewhat smelly code because you didn't find the appropriate datatype for your case, and decided to roll your own scheduling instead. Ouch.

>As of 2.7.0, my bc had 77 places where signals were checked, and I was probably missing a few.

To refactor this to a simple CAS operation to see if the thread should terminate, doesn't seem too unrealistic to me. Only 77 places to drop a macro that does the op - checking only if the signal handler has told your high-performance thread us to stop, hup, die, etc.

Signals are awesome, and work great - obviously - for many, many high-performance applications, and your high-performance, CPU-bound application might feel like the only way you could do it - but you certainly can attain the same performance and still handle signals like a well-behaved application that doesn't have to take big leaps just to stay ahead of the scheduler ..

gavinhoward · on Oct 16, 2023

My 77 uses were macros that did atomically check a flag.

My point was that that does not scale.

Also, I didn't have a scheduler. The point of interrupts is that you don't need a scheduler.

helpfulContrib · on Oct 16, 2023

>My point was that that does not scale.

Thanks for the clarification.

>Also, I didn't have a scheduler. The point of interrupts is that you don't need a scheduler.

I think where we digress in position is that I do not think you have a good justification for the statement "be scared of signals" because, after all, you are clearly not scared of them and have decided to bend them to your own thoughts on how best to optimize your application, so its sort of ingenuous to hold the position having completely wiped "the standard way to do high-performance signal-handling" from your slate, to put your own special case forward by example. You're clearly not scared of them.

I'm calling you out on it because signals are absolutely not scary, but maybe talking about them with other experts can be.

Your case is more an example of how unscary signals are - but you've opted for longjmp()'s (which are, imho as a systems programmer, a far more cromulant fear) in your code as a solution to a problem which I don't think is really typical.

Thus, not really scary at all.

Well, it was a fun read of some code, and thanks for bc anyway.

epcoa · on Oct 16, 2023

Sorry but both boffinAudio and you are providing considerable misinformation.

https://stackoverflow.com/questions/5339769/relative-perform...

Bus activity makes CAS and any atomic operation far more costly than a single cycle. If they were really that cheap then every operation would just be atomic.

In general you must trade off bandwidth for improved latency. Audio work by its nature can and must do this. It is not the appropriate trade off in frankly most cases of computing (even if it is arguably more interesting)

actionfromafar · on Oct 16, 2023

I think the implicit idea is that the CPU bound thread only is kept busy in smallish chunks, and you’d inject a stop package in its stream of work.

cesarb · on Oct 16, 2023

You're assuming that the work can be split into "smallish chunks", instead of a single large CPU-bound computation. Sure, "injecting a stop package in its stream of work" is the best design, but only if you do have a stream of work in the first place. Otherwise, it's either signals, or constantly checking a shared "stop" variable in the middle of a very hot CPU-bound loop.

gavinhoward · on Oct 16, 2023

Yes, but how do you limit the size of those chunks?

Perhaps only do one math operation?

Calculating 2^4294967296 is one operation, but it can take minutes (hours?) on my bc. Do you want it to take that long, not responding to SIGINT the entire time?

Probably not.

Smaller chunks? Then you have to check for signal in a loop, which causes its own problems.

I used to do that, and as I told boffinAudio, that caused problems with state being invalid and such.