Hacker News new | past | comments | ask | show | jobs | submit login
A little story about the `yes` Unix command (endler.dev)
260 points by derwiki on July 21, 2022 | hide | past | favorite | 118 comments



Once one of our devops engineers was testing a script with nohup yes > output.

The /home directory was mounted to an autoexpanding EFS on AWS.

23.4tbs and 2 months later we noticed the bill :)


Dare I ask: how much?


Not too much, like $4k extra a month i think, noticeable but not the end of the world.

I assume a smaller place would just beg AWS for forgiveness and probably get it.


> Not too much, like $4k extra a month i think, noticeable but not the end of the world.

It's nice that it's manageable and also a learning experience, but over here that would be like 2 months' take home salary for a software engineer.

Kind of why development/test environments shouldn't have autoexpanding or scalable anything, in my experience.


I assume larger companies can deal with temporary mistakes on the order of one engineer’s salary.


Oh, they certainly can, some better than others. Though personally I'd most certainly want to avoid such situations.

First, to retain an air of "vaguely knows what they're doing" about me, even though everyone makes mistakes and that should be treated as something that's okay - especially if you can limit the impact of mistakes, like with automated spending limits.

Secondly, because I wouldn't want to risk doing something like that in a personal project, given that my wallet is likely to be much thinner than those of organizations.


As in, paying for that engineer was a mistake all along?


No. It doesn't really impact the company's bottom line if your software engineering org is 100 people making $20k a month and someone accidentally wastes $4k of EBS disk. It's nice if you don't waste it, of course, but "oops, filled up the disk with 'y' output" is better than "yeah actually all of those files are pretty important, I think team X is using them" because you can instantly delete it, rather than doing a multi-month project to see if team X really is using the files.


Everything should be made to bounded, is there no max that could have been set? Not using an expandable storage in test risks deviations from prod (and then you get the prid only bugs that are difficult to keep fixed). I wonder if there us a failure to do pre-prod monitoring as well - it's super handy having dashboards telling you disk bbn usage


> Everything should be made to bounded, is there no max that could have been set?

I'd expect that you'd reach the maximum once your card is rejected. :)

But truthfully many platforms out there will let you set up spending alerts, but not outright set limits because then you get into a bunch of difficult questions - should further data just be redirected and piped to /dev/null? Should you as the service provider instead limit IOPS in some way, or allow slower network connectivity if allowed egress amount of data is exceeded? What about managed databases, slow it down or throw it out altogether?

I've talked about those in detail with some of the people here ages ago and there are actually companies that take the "graceful degradation" approach, like Time4VPS who host almost all of my cloud VPSes for now: https://www.time4vps.com/?affid=5294 (affiliate link, feel free to remove affid if you'd prefer not to have it)

  What happens if I run out of bandwidth?
  We reduce your VPS server’s port speed 10 times until the new month starts. No worries, we won’t charge any extra fees or suspend your services.
Honestly, that's a really cool idea for handling resources, though one has to also understand that storage is a bit different and if you've built your entire platform around the concept of scalability and dynamically allotting more resources, you might make the choice of having the occasional story of large bills (some of which you'll probably forgive for good PR), as opposed to more frequent stories about things going down because the people forgot to pay you, as well as many enraged individuals complaining about their data being deleted, though it's supposedly your fault.

So in a way it's also a business choice to be made, though one can feasibly imagine hard spend limits being feasible to implement.

> Not using an expandable storage in test risks deviations from prod (and then you get the prid only bugs that are difficult to keep fixed).

I concede that this is an excellent point, you also should be able to test automatic scaling when necessary etc.

Though the difference is probably in being able to test it but not leave it without (more conservative) limits when you're not looking at it.


I wonder what the quickest way to rack up the highest bill on AWS is.


Private CA + Dedidicated CloudFront IP's are the fastest ways to do it in one line-item, but most commonly massive DB instances. Why create an index when you can double the number of cores? Wait, cores don't help improve queries? But more memory, so better right? Elastic MapReduce with oversized instances used to be pretty common, but RDS is a perpetual winner for most companies I've worked with.

But the typical worst case-practices, are the small companies who think avoiding vendor lock-in is a thing that matters at their scale. Look, you're never going to change cloud providers - if you do, that will be a problem you can solve then. You're never going to go multi-cloud. If you do that's a problem you can solve then. Preemptively DIY'ing everything from database copies to security to encryption is going to break you - you're now engineered on a brittle substrate of hacks with no support, all so that one day you could maybe consider saving 10% on your cloud bill by moving to a different provider. The day that migration will save you more than one engineer per month, consider it. Until then, you're just making your own costs worse.

/rant


And in real life I've seen vendor lock-in cause exactly every worst fear and worse.

Vendor lock-in is only tolerable when the service is so easily swapped out that it's not actually vendor lock-in.

There is no valid argument for not worrying about that before it happens, and bending pretty far to avoid it. No matter how hard you work to stay as portable as possible early and at each daily step along the way, it's 10x or 1000x less than dealing with it later.

If you're just talking about a pluggable service, well then by definition that's not really lock-in.


I believe you, I do, but for every company afraid to use the features of a service they pay for because of vendor lock in, I can show you five, six and seven figure bills attributable to DIY. Nothing is drop in if it's value added - if it's not value added, why are you using a vendor at all?

Portability is a huge myth that eats engineers hours like a snack every single day and rarely pays off.


I believe that you believe me, so I'll rest there, since I don't want to speak actual products and timescales and business sizes and buusiness types required to make the claim more solid.


> the small companies who think avoiding vendor lock-in is a thing that matters at their scale

This. If AWS ever were to even consider dramatically increasing their prices, there are players whose AWS bills are two or three or more orders of magnitude more expensive than yours who will howl and gnash their teeth and whose potential departure from AWS does far more to protect you than you could ever do to protect yourself.

Similar stupidity includes spending engineering time on such issues like how to deal with an S3 outage. If S3 is down, your competitors are down too. Nobody cares.


The people with huge AWS bills are already paying a different rate to you. They don't care what happens to your rate.


One lock-in scenario to consider is if you may ever want to offer an on-premise version of your SaaS product. This is what my company is doing now. It's a huge pain in the butt, but it does bring in a lot of revenue.


Lambda functions on a put event on S3 that also put an object _into_ an s3 bucket is so common it's called out on the aws documentation page [0].

[0] https://docs.aws.amazon.com/lambda/latest/operatorguide/recu...


The Lambda functions should put at least two objects into the bucket, otherwise you don't get that nice exponential growth.


Ouch!


Leak your AWS keys and a nice support team will take care of it by mining cryptocurrencies with your credit card.


Plus make sure your contact details are NOT up to date so you miss the AWS warnings...


A fork bomb that fires off GPU enabled VMs for every instance of the fork?


That will hit provisioning limits nearly instantly


That's ok ;)


For reference launching a single u-12tb1.112xlarge and leaving it up for a month would be around 80k


A DynamoDB table provisioned at 40,000 RCU/WCU ? You have been warned!


And this is why you need grafana dashboards of your storage!


Nobody ignores the billing statement :P


Why didn't you have an alarm set?

If you are going to use ANY cloud provider, learn about alarms, or you will get screwed.


Comparing the 1979 version with the current version deserves its own whitepaper.

Writing portable, robust code is nontrivial. This is a great example.

For those who didn't RTFM:

1979 version:

    main(argc, argv)
    char **argv;
    {
      for (;;)
        printf("%s\n", argc>1? argv[1]: "y");
    }
Current 128-line GNU version:

https://github.com/coreutils/coreutils/blob/master/src/yes.c


I would argue this is pointless optimization for the sake of looking good on an artificial benchmark. Nobody needs a 3GB/s `yes` command! Modernize the syntax to C89 and the 1979 version is perfectly fine for any non-contrived use case.

What would be way more interesting is fixing the standard library/OS interfaces to make the slow version fast, because that would likely benefit other simple filter commands as well. Might require using non-POSIX interfaces to do well, of course.

edit: POSIX note


https://www.gnu.org/prep/standards/standards.html#Reading-No...

> Don’t in any circumstances refer to Unix source code for or during your work on GNU! (Or to any other proprietary programs.)

> If you have a vague recollection of the internals of a Unix program, this does not absolutely mean you can’t write an imitation of it, but do try to organize the imitation internally along different lines, because this is likely to make the details of the Unix version irrelevant and dissimilar to your results.

> For example, Unix utilities were generally optimized to minimize memory use; if you go for speed instead, your program will be very different. You could keep the entire input file in memory and scan it there instead of using stdio. Use a smarter algorithm discovered more recently than the Unix program. Eliminate use of temporary files. Do it in one pass instead of two (we did this in the assembler).

> Or, on the contrary, emphasize simplicity instead of speed. For some applications, the speed of today’s computers makes simpler algorithms adequate.

> Or go for generality. For example, Unix programs often have static tables or fixed-size strings, which make for arbitrary limits; use dynamic allocation instead. Make sure your program handles NULs and other funny characters in the input files. Add a programming language for extensibility and write part of the program in that language.

> Or turn some parts of the program into independently usable libraries. Or use a simple garbage collector instead of tracking precisely when to free memory, or use a new GNU facility such as obstacks.


Putting aside locale, and other important things universal code should have in the first 75% of the new version:

What is your argument that it is silly? Have you done a survey of usage, or researched its history. Don't be so quick to criticize code, especially from a huge, old, project (that might even be older than you).


The above code doesn't propagate errors


It doesn't do lots of important things!


I always enjoy reading these kinds of write ups digging into why something is as fast as it is and how it interacts with a wider system. However, I do think that one should not arrive at the belief that this kind of optimisation is warranted everywhere and that code simplicity can also be a goal. The classic argument is to compare OpenBSD’s yes [1] to GNU coreutils’ yes [2] and contemplate on under which circumstances those additional MB/s of “y” will be critical enough to warrant the maintenance of more than one hundred additional lines.

[1]: https://cvsweb.openbsd.org/src/usr.bin/yes/yes.c?rev=1.9&con...

[2]: https://git.savannah.gnu.org/cgit/coreutils.git/tree/src/yes...

Similarly, there is the NSFW comparison between Plan 9’s cat and GNU coreutils’ cat [3].

[3]: http://9front.org/img/longcat.png

Just to reiterate in the end though. This is not an argument against optimisation and learning how to make something blazingly fast. But, there is such a thing as optimising the wrong thing and using speed as the only justification for merging a patch is probably not the right thing to do for a bigger project.


I carefully considered this before I optimized GNU yes.

The reason it is useful is because yes can output anything, and so is useful to produce any repeated data for test files etc.

You can see the justification detailed in the original optimization commit: https://github.com/coreutils/coreutils/commit/35217221


I have a history question. I've seen this link a few times: https://www.gnu.org/prep/standards/html_node/Reading-Non_002...

Did this advice from GNU inspire you to optimise `yes`, did your optimised `yes` inspire GNU to write this, or is there no historical connection between your optimised `yes` and this advice?


Trying to differentiate GNU implementations had nothing to do with it. This is never a consideration for me. It was worth the slight increase in complexity for the reasons stated in the commit message. Also an unmentioned point is that the coreutils code is very often referenced, so should be as robust and performant as possible, so those properties may percolate elsewhere.


Thanks, that makes a lot of sense.


So you actually wrote a new program. Yes was made for those pesky installers, not for producing large amounts of data. This would be my approach. Keep yes simple and create a new program that does spilling data well.


And then you have a possibly confusing situation where you have two programs that do essentially the same thing, but one is faster, and the other one is possibly not provided by default. As a user, in cases it matters, you'd have to know the issue and bother installing the new program. This is worse. As developers, I think it's our duty to make life of users simpler, even if it makes our lives a bit more complicated. I'd argue that's what we are here for.

I guess there's no ideal solution. But I think the "new" program does what the first one did better, and does not do anything worse.

We are talking about a program that still under 1000 lines of code and that's not getting new features every month, or at all anyway, so maintainability does not seem to be a big issue?

I see the reasoning but I don't see any actual practical drawback to have improved the original program directly in this specific case. I don't see any advantages of keeping "yes" dead simple neither. The new version is still pretty much readable and the extra time it takes to read it and modifying without doing mistakes seems worth the advantages.


> And then you have a possibly confusing situation where you have two programs that do essentially the same thing

My point is, I'd never thought of using yes for this purpose. So in this case, you could make a command called 'outputsomethingfast' and you could make a command called yes, that internally calls 'outputsomethingfast --output=yes' or something like that.

To me this is way more logical, and more in line with the Linux philosophy, right?


I guess. I don't like the name "yes" and I think we would have been better off with a more general name since the command is general, but now it's there, so…

However, this is independent from this optimization, "yes" already had this feature of outputting anything, I think?

But I expect this kind of accident to happen in any working system that has been long enough. This seems unavoidable. So we'd better put up with this kind of mess probably.


Touche, and there is already a program for the exact purpose of generating large amounts of data: jot(1)

https://manpage.me/?q=jot


Or if homogeneous data is fine, just cat or dd from a source like /dev/zero.


I think I've seen /dev/random used for generating large amounts of garbage data as well. (Though usually my problem is too much garbage data, not too little.)


And here is the problem. Now we have 2 programs that do essentially the same thing.


i really don't want to have to learn 12342384 programs. it's much less discoverable than having a few programs with a --help (and more generally a tree-based organization of functionality on your computer).

also, if there's a new program, say, "fastrepeat" wouldn't that be a duplication of functionality between "yes" which just outputs "y" and "fastrepeat 'y'" which is, like, even more bloat since now you need both ?


I would much rather have a very easy to remember command that does one thing and one thing only (namely, what it says on the tin) than to have to remember or dig through a whole slew of command line options in order to get 'yes' to become the equivalent to 'no' or 'cat'.


> I would much rather have a very easy to remember command that does one thing and one thing only

well, I definitely don't. I don't want to encumber my mind with a name for every single of the 25000 "one thing" things I have to do.


really you want one program that outputs data, and yes is an alias to outputdatafast --data="y"


  > However, I do think that one should not arrive at the belief that this kind of
  > optimisation is warranted everywhere and that code simplicity can also be a goal
Definitively not warranted everywhere, especial if premature and at such detail, but core utils like `cat` and `yes` are IMO prime examples of where such optimizations are warranted:

- They're in use daily on a huge amount of setups, even small benefits add up much more than in some niche tool.

- They got a clear and small feature set that won't change anytime soon, so there won't be much code churn and thus maintenance effort will be relatively low.

  > Similarly, there is the NSFW comparison between Plan 9’s cat and GNU coreutils’
  > cat [3].
  >
  > [3]: http://9front.org/img/longcat.png
IMO that isn't an exactly fair comparison though, as the difference is not only in optimizations but a lot in boilerplate license/copyright comments and in features like line numbering, modes for showing (non-printable) special characters or whitespace and option parsing for said features.

Strip all that out, and you got (eye balled) 1/3 of that, and 200 lines for an extremely fast core util is really not much nor hard to maintain, as it won't get any new features soon anyway.


Right. Faster generally means less CPU used for a particular use. For example I made ls(1) a bit faster recently. Even though not really noticeable per run, given how often it is run I estimated this to save 9 tonnes of carbon emissions per year. https://twitter.com/pixelbeat_/status/1511030095011581953


This comment rubs me all the wrong ways possible.

Is the posted link supposed to be a reference? That is your own tweet aka self-reference, boasting your own contribution.

In the tweet itself you say "estimate"! How do arrive at such a grandiose estimate?

How do you attribute a saved carbon footprint to an optimization in a command line tool? You cannot even approximate that. I would argue that such tiny optimizations make 0, nil difference in overall energy consumption on my local machine, all my laptops and all the servers in this building.

I'm not saying that we shouldn't run optimized code, but everyone can scream around random numbers.


> and in features like line numbering, modes for showing (non-printable) special characters or whitespace and option parsing for said features.

I remember a page somewhere saying plan9 people where very much against making cat be a generic tool for all those use cases, and it should do what it says in the name: concatenate files.


I agree with your premise, that additional complexity is not always worthwhile. But I don't think this "classic argument" is very strong. Hardly even an argument at all.

You compare two things and see one is longer and more complex than the other. How much cost is that really? 10x more code sounds bad, but 100 more lines might put it in perspective. And how complex is the code really? And what is the benefit? A common complaint seen on OpenBSD lists is that performance is behind competitors so you take a bunch of those complaints and make an equally sound argument the other way.

I will say that a lot of the tools and libraries and functionality I have seen the hell optimized out of and functionality added to, allows solutions to be put together which would be infeasible or impossible with simple / naive implementations. More layers or custom code or more complexity can be avoided. Let's say a database layer could be avoided if filesystem operations are fast enough. Or a shell script + cmd line tools can be used instead of writing a new program if fork+exec+exit+context switching+pipe IO, and these kind of tools (yes and cat) are fast. If malloc+free are fast then you don't need to write your own caching allocator in front of it. Etc etc. So you might end up with an end-to-end solution that meets your requirements and actually has less code, or at least less bespoke complexity and more that is long maintained and used by many.


I think the main reason that GNU tools are overly optimized is copyright. [0] They just try to avoid any copyright claim by the old proprietary UNIX tools like the copyright on an empty file. [1]

[0]: https://www.gnu.org/prep/standards/html_node/Reading-Non_002...

[1]: http://trillian.mit.edu/~jc/humor/ATT_Copyright_true.html


Even if those additional MB/s are not critical for any particular application, tools like these are ran a humongous amount of times every day throughout the world, so while I have no data, I suppose the total CPU usage could be enough that global energy/carbon savings from this kind of optimization would be relevant.


I never heard about anyone using yes to generate large data files/streams. Does anyone actually uses yes in a way that its daily CPU usage will be bigger than a rounding error?

Also very simple FreeBSD implementation [1] is not too slow on my 10-years old notebook:

  > time yes test_string | dd of=/dev/null bs=1M count=65536
  ...
  2646329200 bytes transferred in 1.709196 secs (1548288918 bytes/sec)
  0.023u 0.850s 0:01.71 50.8% 5+166k 0+0io 0pf+0w
Firefox probably have used more CPU time while I was composing this comment - thanks to JS (in other tabs, HN is a rare example of a site which doesn't abuse my CPU). FF is almost always on the 1st line in top.

If you'll check top/powertop and a typical desktop or a server you'll likely find better targets than 'yes' to reduce energy use.

[1] https://github.com/freebsd/freebsd-src/blob/main/usr.bin/yes...


I don't think it works that way.

While in general efficient programs can save energy, the "saved" MB/s do not necessarily correlate with saved J. There is no direct cost for a instruction, there is a severe overhead from the machine plainly being switched on. And it's not like you will always be able to "use" that saved MB/s for something else.

And you entirely neglect the "cost" of optimizing. Alone the time spent looking at inefficient code probably cost more energy than all the actually saved energy by a single change.

Consider the time someone could have spent on something else, with significantly more impact.


I'm not an expert on hardware, but I do know that CPU consumption depends on CPU load. A laptop battery drains faster and fans run faster when the CPU is working than when it is idle. More MB/s mean less seconds and therefore more idle time. So how could running a program that has more instructions and takes longer to do the same work not have a cost?

About the second part of your comment, it's true that optimizing has a cost, and if I were creating a "yes" or "cat" program for personal use it would obviously be pointless to optimize. But if it's a program ran by millions of people often (probably more true of "cat" than "yes"), it's not that obvious to me that millions of little savings cannot offset the time of one person optimizing.


So much agreed. The computing operations where saving instructions equals saving energy are really few and far between. Or non-existent.


Once the good-enough version has already been written and sitting around for a few decades, you need an excuse not to get around to optimizing it sooner or later.

The maintenance argument isn't a good enough one. It's a factor, just not a strong one.

There is no reason for every tiny bit of something as foundational as the os to just get better and better forever.

The only reason we write things down in the first place, is so that we can do the work once and then refer to it many times without having to re-create it each time. So there is very little argument for keeping a program small and simple like the first version.

Some, but just not much. Because black-boxifying that complexity is what writing (be it a legal document or a program) is for in the first place. Making a more sophisticated better performing version 2 of something is simply using the tools of writing that exist for no other purpose ultimately.

It does go the other way too. Version 3 could be to invest yet more brainpower into figuring out how to get the same performance in fewer operations. And on that day soneone will wonder if it's worth optimizing that when it already works fine and compute resources are infinite. The answer then as ow will be the same "Yes. Of course."


> Similarly, there is the NSFW comparison between Plan 9’s cat and GNU coreutils’ cat [3].

You know, if there was a SFW version, it would be nicely illuminating about the complexity for getting similar stuff done and handling edge cases and whatnot.

That said, I think the Plan 9 version could use a few comments to decrease the cognitive load, since individual bits of code felt more approachable to me in the GNU version.

Though with the code itself being shorter, one liners or even just a few lines at the top of the function definition could be sufficient.


When to optimise [1] and “premature optimisation” has been an ongoing concern in computer science ever since performance constraints and limits were identified (i.e. since forever).

[1]: https://en.wikipedia.org/wiki/Program_optimization


Re [3], yeah so you mean a program which is faster, better documented, and has more functionality, has a longer source code? Hm yeah?

> using speed as the only justification for merging a patch is probably not the right thing to do for a bigger project

I don't get this. Speed is important. Energy efficience is important. Have we gotten so used to the bloat that performance and energy saving must be disregarded?


I think if you dig through GNU coreutils or *BSD world (probably FreeBSD for the “newest and fastest”), you’d be surprised at how complex almost any of the binaries are. That’s just the benefit of 40+ years of constant updates and optimizations on feature-complete software.


PSA: Do not use `yes` on MacOS CI. In those environments you only get one CPU, which `yes` will use at 100% and never release, and your entire workflow will hang. There is such a thing as too much performance, ironically.


Windows 3.1 had the same issue, where there was no preemptive scheduling


Curious: have you tried nicing 'yes' in this situation?


This sounds like a scheduler bug?


Yep most definitely a kernel bug.


It always amazed me how fast “yes” is. Whenever I need to generate a really big file for whatever reason (e.g. testing multipart file uploads), I will run something like “yes > myfile.txt” for a few seconds.

Similarly, I tried writing a naive version in node.JS a few years ago and it was several orders of magnitude slower than coreutils…, which is usually the case whenever I try and clone one of the “simple” utilities from coreutils.


I just tried "yes > myfile.txt" for a few seconds (less than five) and then checked the size. 1.4 billion lines! I knew yes was fast but that's pretty mind-blowing.


The GNU coreutils never fail to impress me with how insanely fast they go. One of my favorite blog posts of all time [1] really opened my eyes to how insanely optimized Unix really is.

[1] https://adamdrake.com/command-line-tools-can-be-235x-faster-...


`cat /dev/urandom > /tmp/myfile` is faster on macos though


`yes` is far faster on my Linux (2.2GB/s) vs /dev/urandom (0.38GB/s) -- not surprising since the `man` page for urandom says:

"[urandom] is designed for security, not speed, and is poorly suited to generating large amounts of random data"

`cat /dev/zero` is pretty much as fast as `yes` on my system however (2.1GB/s)


I suspect that just doing a `truncate` command is probably even faster. I just like the `yes` version because it requires basically zero thought on my end.


Always love reading about performance enhancements to code. This is a great example of how much you can do with such a simple thing.

I can't see that any of this optimisation is actually useful in the real world. Does yes actually need such a high throughput? Would anything suffer if it didn't? Probably not. Still fun though.


Maybe some day you want to automate installing a program that needs about three billion confirmations a second regarding various things...


Let's hope not, but good to know yes has me covered :)


It's cheaper than a lawyer, that's for sure.


Compare https://codegolf.stackexchange.com/questions/215216/high-thr...

> *By far the best score so far is by @ais523 - generating FizzBuzz at a throughput that seems to average somewhere around 54-56GiB/s.

The best 'yes' in the linked article gets about 3GiB/s. To be fair: the optimized fizzbuzz is insane.


There’s a fastest yes competition on CGCC as well: https://codegolf.stackexchange.com/questions/199528/fastest-...

My entry outputs 28 TB/s. (I…may have done some creative interpretation of the rules.)


A solution that requires the rules to be amended is obviously very creative!

Impressive out of the box thinking.


Why would you want buffer in the yes command though? The point is to only provide output if stdout is being read (ie. stdout is available for writing.) That way you only provide as many lines of output as is needed.

Isn’t filling a buffer going to be a huge waste of CPU for the common case where the command only needs to provide one or two “yes”es? (Common case as in, its intended use of working around scripts that interactively prompt you to continue, etc…)


1. Performance of yes is not a concern for that "common case" (well, is that case still common? who writes only interactive program that require yes/no anwers from the terminal?).

2. Filling a buffer has much less CPU (power) cost as it doesn't involve context switching like I/O.


On my Asahi-Linux system, the rust version runs at about 15GB/s, but the gnu version runs as "just" 10GB/s.

However, the author had the opposite results. I wonder if the architecture makes a difference.


Could you try running it on macOS on the same machine? I wonder if that would be any different.


It’s also a great way to synthetically max out a core. Want to check thermals? Got 8 cores? Spin up 8 yes > /dev/null

And let them rip.


For amusement launch it with `yes | head -n 8 | xargs -n 1 -P 0 yes`

This is how I thermally stress CPUs but I usually do `yes speed | head -n 8 | xargs -n 1 -P 0 openssl` instead.


Convenient for slight stress testing, but if you want maximum thermal generation there is no substitute for FIRESTARTER:

https://github.com/tud-zih-energy/FIRESTARTER

With my system hooked up to a watts up (but including the monitor and a couple of other things) yes > /dev/null on each core gets a bit above 42w, openssl speed on each core occasionally gets above 45w and running FIRESTARTER for a bit gets above 57w (on an i5-6260U (NUC) with hyperthreading disabled, 15W TDP with some attempted power restraint in the BIOS, firefox on decent websites like HN tend to use about 31w and I think that is something like 10-15w (I think 12w but it has been a bit) measuring the computer only).

FIRESTARTER has some evolutionary algorithms as well, but at least on my CPU after hours they were still doing much worse than the default. I was wondering why there wasn't much discussion of BIOS power settings that I could find and after some testing found out that they are not effective at restraining max power use (I forget if they had any effect on typical power use either but I don't think it was much if they did). Also, the integrated GPU can use more power than the CPU and can't be limited. For that GpuTest is handy (but not open source):

https://www.geeks3d.com/gputest/

For me on the internal GPU, Pixmark Piano and Furmark use the most power, either can get 60-61w and adding FIRESTARTER in the background only adds another w or two.

Similarly, checking temps with turbostat (PkgTmp), 2x yes seems to max out about 70, testing one of the higher power openssl tests on each core reaches 75, FIRESTARTER alone quickly reaches 80 and slowly ramps up to 90, and adding gputest got up to 96. Interstingly, the max temperature is reached when the CPU dethrottles too quickly after the gputest is done. It takes a few gputests alone in a row to get into the 80s (I got bored after 2x each alternating Furmark and Piano that hit 83). Similarly, looking at power useage with turbostat the max PkgWatt with yes is 8.26, with openssl speed ecdsa 9.88, with FIRSTARTER alone 17.62, + gputest 19.97 (or gputest alone, seems unreliable).

Anyway, this is a long diversion to say that even fancy yes or openssl speed tests are not that great as CPU stress tests :).


This 'yes' implementation in Python also gets to the GB/s range on Linux, approximately on par with gnu yes: https://gist.github.com/jepler/5e46f4e46542367d75cde9d77d586...

It takes the useful insights from this article and packages them back into a Python program. But there's so little actual Python code running that the efficiency relative to a compiled language is swamped by all the data copying to the kernel.

vmsplice()ing is probably the next step in speed, however, vmsplice apparently doesn't fit well into the semantics of python (or probably rust) programs. trying to reduce the number of python bytecodes by calling `os.writev(1, many_bufs)` actually harmed performance, not sure why.


Nice writeup but I have a question: Why does `yes` need to be fast?


When all you have is a hammer, everything looks like a nail.


Back in the 90s I worked in a shared linux office environment. Sometimes people would leave themselves logged in, so of course it was your duty to mess with them if you found such an egregious security violation.

Once we found one logged in and decided to add 'yes' as the last line of his bash login.

That day we learned, via an angry phone call, that you can't Ctrl^C out of 'yes' via a remote shell. Today after reading this I'm assuming it's because of the way it floods the buffer.

Poor guy had to call the one dude with root access to fix it for him.


I recall in college we'd `xhost +` and then start running things remotely on their desktop, playing sounds, etc, but it's been a very long time. Switching the keyboard layout to Dvorak is another fun one.


Related:

How is GNU `yes` so fast?

https://news.ycombinator.com/item?id=14542938


If you ever had to repair a disk using fsck when it had hundreds of errors, you would understand why yes(1) was created.


Wouldn't it be better to use dd with something like /dev/zeros instead?


The idea is that you pipe `yes | fsck` to answer yes to all questions.


Recent HN discussion (2 months ago): https://news.ycombinator.com/item?id=31619076


Great article. I tried the command for a brief second with output to a text file. I cannot believe that on my M1 after 1-2 seconds had 2684272500 lines amounting to 5GB. Yikes


I was setting up centralized logging on a bare metal Kubernetes cluster (500 GB HDD IIRC). I ran a deployment that ran the yes command to generate logs at the end of the day.

When I returned to work in the morning, 2 of the nodes were unhealthy because their hard drives were full.

A couple years later I was load testing Datadog logging from a cluster and also used yes (no logs on disk this time).

I used the daily log quota for the entire Datadog account in 30 minutes.


Is there a was to remove this off Debian? Caused more problems than and I’ve never used it for what it was designed.


I suppose you could add an alias that does nothing instead?


6.4 GB/s

   #include <libc.h>

   #define BUFSIZE (512*1024)

   static char buf[BUFSIZE];

   int main(void){ 
        memset(buf,'y',BUFSIZE);
        while (1) {
           write(1,buf,BUFSIZE);
      }
      return 0;
   }


The simplest Unix command is arguably "true"


… which the article also points out just two sentences in!



Too lazy to figure out, since python's print is already buffered, is there anything we can do to beef up the python print() throughput?


I’m really surprised that rewriting in Rust didn’t result in optimal performance immediately.


Any use of `println` in Rust is going to have its performance dominated by repeatedly locking stdout and flushing the buffer every time. It's a bit of an obscure fact (since most programs aren't bottlenecked on printing), but `println` in Rust is kind of just supposed to be a convenience for hello world and print-debugging. Rust provides `writeln` for serious use, which is more verbose but allows you more fine-grained control over locking, buffering, and error-handling. See also https://nnethercote.github.io/perf-book/io.html


yes!


Well that was a bunch of nonsense, cute I guess.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: