A little story about the `yes` Unix command

dmateos · on July 21, 2022

Once one of our devops engineers was testing a script with nohup yes > output.

The /home directory was mounted to an autoexpanding EFS on AWS.

23.4tbs and 2 months later we noticed the bill :)

pyuser583 · on July 21, 2022

Dare I ask: how much?

dmateos · on July 21, 2022

Not too much, like $4k extra a month i think, noticeable but not the end of the world.

I assume a smaller place would just beg AWS for forgiveness and probably get it.

KronisLV · on July 21, 2022

> Not too much, like $4k extra a month i think, noticeable but not the end of the world.

It's nice that it's manageable and also a learning experience, but over here that would be like 2 months' take home salary for a software engineer.

Kind of why development/test environments shouldn't have autoexpanding or scalable anything, in my experience.

saagarjha · on July 21, 2022

I assume larger companies can deal with temporary mistakes on the order of one engineer’s salary.

KronisLV · on July 21, 2022

Oh, they certainly can, some better than others. Though personally I'd most certainly want to avoid such situations.

First, to retain an air of "vaguely knows what they're doing" about me, even though everyone makes mistakes and that should be treated as something that's okay - especially if you can limit the impact of mistakes, like with automated spending limits.

Secondly, because I wouldn't want to risk doing something like that in a personal project, given that my wallet is likely to be much thinner than those of organizations.

martopix · on July 21, 2022

As in, paying for that engineer was a mistake all along?

jrockway · on July 21, 2022

No. It doesn't really impact the company's bottom line if your software engineering org is 100 people making $20k a month and someone accidentally wastes $4k of EBS disk. It's nice if you don't waste it, of course, but "oops, filled up the disk with 'y' output" is better than "yeah actually all of those files are pretty important, I think team X is using them" because you can instantly delete it, rather than doing a multi-month project to see if team X really is using the files.

seadan83 · on July 22, 2022

Everything should be made to bounded, is there no max that could have been set? Not using an expandable storage in test risks deviations from prod (and then you get the prid only bugs that are difficult to keep fixed). I wonder if there us a failure to do pre-prod monitoring as well - it's super handy having dashboards telling you disk bbn usage

KronisLV · on July 22, 2022

> Everything should be made to bounded, is there no max that could have been set?

I'd expect that you'd reach the maximum once your card is rejected. :)

But truthfully many platforms out there will let you set up spending alerts, but not outright set limits because then you get into a bunch of difficult questions - should further data just be redirected and piped to /dev/null? Should you as the service provider instead limit IOPS in some way, or allow slower network connectivity if allowed egress amount of data is exceeded? What about managed databases, slow it down or throw it out altogether?

I've talked about those in detail with some of the people here ages ago and there are actually companies that take the "graceful degradation" approach, like Time4VPS who host almost all of my cloud VPSes for now: https://www.time4vps.com/?affid=5294 (affiliate link, feel free to remove affid if you'd prefer not to have it)

  What happens if I run out of bandwidth?
  We reduce your VPS server’s port speed 10 times until the new month starts. No worries, we won’t charge any extra fees or suspend your services.

Honestly, that's a really cool idea for handling resources, though one has to also understand that storage is a bit different and if you've built your entire platform around the concept of scalability and dynamically allotting more resources, you might make the choice of having the occasional story of large bills (some of which you'll probably forgive for good PR), as opposed to more frequent stories about things going down because the people forgot to pay you, as well as many enraged individuals complaining about their data being deleted, though it's supposedly your fault.

So in a way it's also a business choice to be made, though one can feasibly imagine hard spend limits being feasible to implement.

> Not using an expandable storage in test risks deviations from prod (and then you get the prid only bugs that are difficult to keep fixed).

I concede that this is an excellent point, you also should be able to test automatic scaling when necessary etc.

Though the difference is probably in being able to test it but not leave it without (more conservative) limits when you're not looking at it.

HellsMaddy · on July 21, 2022

I wonder what the quickest way to rack up the highest bill on AWS is.

abofh · on July 21, 2022

Private CA + Dedidicated CloudFront IP's are the fastest ways to do it in one line-item, but most commonly massive DB instances. Why create an index when you can double the number of cores? Wait, cores don't help improve queries? But more memory, so better right? Elastic MapReduce with oversized instances used to be pretty common, but RDS is a perpetual winner for most companies I've worked with.

But the typical worst case-practices, are the small companies who think avoiding vendor lock-in is a thing that matters at their scale. Look, you're never going to change cloud providers - if you do, that will be a problem you can solve then. You're never going to go multi-cloud. If you do that's a problem you can solve then. Preemptively DIY'ing everything from database copies to security to encryption is going to break you - you're now engineered on a brittle substrate of hacks with no support, all so that one day you could maybe consider saving 10% on your cloud bill by moving to a different provider. The day that migration will save you more than one engineer per month, consider it. Until then, you're just making your own costs worse.

/rant

Brian_K_White · on July 21, 2022

And in real life I've seen vendor lock-in cause exactly every worst fear and worse.

Vendor lock-in is only tolerable when the service is so easily swapped out that it's not actually vendor lock-in.

There is no valid argument for not worrying about that before it happens, and bending pretty far to avoid it. No matter how hard you work to stay as portable as possible early and at each daily step along the way, it's 10x or 1000x less than dealing with it later.

If you're just talking about a pluggable service, well then by definition that's not really lock-in.

abofh · on July 21, 2022

I believe you, I do, but for every company afraid to use the features of a service they pay for because of vendor lock in, I can show you five, six and seven figure bills attributable to DIY. Nothing is drop in if it's value added - if it's not value added, why are you using a vendor at all?

Portability is a huge myth that eats engineers hours like a snack every single day and rarely pays off.

Brian_K_White · on July 21, 2022

I believe that you believe me, so I'll rest there, since I don't want to speak actual products and timescales and business sizes and buusiness types required to make the claim more solid.

solatic · on July 21, 2022

> the small companies who think avoiding vendor lock-in is a thing that matters at their scale

This. If AWS ever were to even consider dramatically increasing their prices, there are players whose AWS bills are two or three or more orders of magnitude more expensive than yours who will howl and gnash their teeth and whose potential departure from AWS does far more to protect you than you could ever do to protect yourself.

Similar stupidity includes spending engineering time on such issues like how to deal with an S3 outage. If S3 is down, your competitors are down too. Nobody cares.

ectopod · on July 21, 2022

The people with huge AWS bills are already paying a different rate to you. They don't care what happens to your rate.

anamexis · on July 21, 2022

One lock-in scenario to consider is if you may ever want to offer an on-premise version of your SaaS product. This is what my company is doing now. It's a huge pain in the butt, but it does bring in a lot of revenue.

maccard · on July 21, 2022

Lambda functions on a put event on S3 that also put an object _into_ an s3 bucket is so common it's called out on the aws documentation page [0].

[0] https://docs.aws.amazon.com/lambda/latest/operatorguide/recu...

h4kor · on July 21, 2022

The Lambda functions should put at least two objects into the bucket, otherwise you don't get that nice exponential growth.

david422 · on July 21, 2022

Ouch!

h4kor · on July 21, 2022

Leak your AWS keys and a nice support team will take care of it by mining cryptocurrencies with your credit card.

belter · on July 21, 2022

Plus make sure your contact details are NOT up to date so you miss the AWS warnings...

jacquesm · on July 21, 2022

A fork bomb that fires off GPU enabled VMs for every instance of the fork?

krallja · on July 22, 2022

That will hit provisioning limits nearly instantly

jacquesm · on July 22, 2022

That's ok ;)

moondev · on July 21, 2022

For reference launching a single u-12tb1.112xlarge and leaving it up for a month would be around 80k

belter · on July 21, 2022

A DynamoDB table provisioned at 40,000 RCU/WCU ? You have been warned!

bluedino · on July 21, 2022

And this is why you need grafana dashboards of your storage!

exabrial · on July 21, 2022

Nobody ignores the billing statement :P

sbf501 · on July 21, 2022

Why didn't you have an alarm set?

If you are going to use ANY cloud provider, learn about alarms, or you will get screwed.

sbf501 · on July 21, 2022

Comparing the 1979 version with the current version deserves its own whitepaper.

Writing portable, robust code is nontrivial. This is a great example.

For those who didn't RTFM:

1979 version:

    main(argc, argv)
    char **argv;
    {
      for (;;)
        printf("%s\n", argc>1? argv[1]: "y");
    }

Current 128-line GNU version:

https://github.com/coreutils/coreutils/blob/master/src/yes.c

KerrAvon · on July 21, 2022

I would argue this is pointless optimization for the sake of looking good on an artificial benchmark. Nobody needs a 3GB/s `yes` command! Modernize the syntax to C89 and the 1979 version is perfectly fine for any non-contrived use case.

What would be way more interesting is fixing the standard library/OS interfaces to make the slow version fast, because that would likely benefit other simple filter commands as well. Might require using non-POSIX interfaces to do well, of course.

edit: POSIX note

operator-name · on July 22, 2022

https://www.gnu.org/prep/standards/standards.html#Reading-No...

> Don’t in any circumstances refer to Unix source code for or during your work on GNU! (Or to any other proprietary programs.)

> If you have a vague recollection of the internals of a Unix program, this does not absolutely mean you can’t write an imitation of it, but do try to organize the imitation internally along different lines, because this is likely to make the details of the Unix version irrelevant and dissimilar to your results.

> For example, Unix utilities were generally optimized to minimize memory use; if you go for speed instead, your program will be very different. You could keep the entire input file in memory and scan it there instead of using stdio. Use a smarter algorithm discovered more recently than the Unix program. Eliminate use of temporary files. Do it in one pass instead of two (we did this in the assembler).

> Or, on the contrary, emphasize simplicity instead of speed. For some applications, the speed of today’s computers makes simpler algorithms adequate.

> Or go for generality. For example, Unix programs often have static tables or fixed-size strings, which make for arbitrary limits; use dynamic allocation instead. Make sure your program handles NULs and other funny characters in the input files. Add a programming language for extensibility and write part of the program in that language.

> Or turn some parts of the program into independently usable libraries. Or use a simple garbage collector instead of tracking precisely when to free memory, or use a new GNU facility such as obstacks.

sbf501 · on July 21, 2022

Putting aside locale, and other important things universal code should have in the first 75% of the new version:

What is your argument that it is silly? Have you done a survey of usage, or researched its history. Don't be so quick to criticize code, especially from a huge, old, project (that might even be older than you).

pixelbeat__ · on July 21, 2022

The above code doesn't propagate errors

sbf501 · on July 21, 2022

It doesn't do lots of important things!

ninjin · on July 21, 2022

I always enjoy reading these kinds of write ups digging into why something is as fast as it is and how it interacts with a wider system. However, I do think that one should not arrive at the belief that this kind of optimisation is warranted everywhere and that code simplicity can also be a goal. The classic argument is to compare OpenBSD’s yes [1] to GNU coreutils’ yes [2] and contemplate on under which circumstances those additional MB/s of “y” will be critical enough to warrant the maintenance of more than one hundred additional lines.

[1]: https://cvsweb.openbsd.org/src/usr.bin/yes/yes.c?rev=1.9&con...

[2]: https://git.savannah.gnu.org/cgit/coreutils.git/tree/src/yes...

Similarly, there is the NSFW comparison between Plan 9’s cat and GNU coreutils’ cat [3].

[3]: http://9front.org/img/longcat.png

Just to reiterate in the end though. This is not an argument against optimisation and learning how to make something blazingly fast. But, there is such a thing as optimising the wrong thing and using speed as the only justification for merging a patch is probably not the right thing to do for a bigger project.

pixelbeat__ · on July 21, 2022

I carefully considered this before I optimized GNU yes.

The reason it is useful is because yes can output anything, and so is useful to produce any repeated data for test files etc.

You can see the justification detailed in the original optimization commit: https://github.com/coreutils/coreutils/commit/35217221

endgame · on July 21, 2022

I have a history question. I've seen this link a few times: https://www.gnu.org/prep/standards/html_node/Reading-Non_002...

Did this advice from GNU inspire you to optimise `yes`, did your optimised `yes` inspire GNU to write this, or is there no historical connection between your optimised `yes` and this advice?

pixelbeat__ · on July 21, 2022

Trying to differentiate GNU implementations had nothing to do with it. This is never a consideration for me. It was worth the slight increase in complexity for the reasons stated in the commit message. Also an unmentioned point is that the coreutils code is very often referenced, so should be as robust and performant as possible, so those properties may percolate elsewhere.

endgame · on July 21, 2022

Thanks, that makes a lot of sense.

dncornholio · on July 21, 2022

So you actually wrote a new program. Yes was made for those pesky installers, not for producing large amounts of data. This would be my approach. Keep yes simple and create a new program that does spilling data well.

jraph · on July 21, 2022

And then you have a possibly confusing situation where you have two programs that do essentially the same thing, but one is faster, and the other one is possibly not provided by default. As a user, in cases it matters, you'd have to know the issue and bother installing the new program. This is worse. As developers, I think it's our duty to make life of users simpler, even if it makes our lives a bit more complicated. I'd argue that's what we are here for.

I guess there's no ideal solution. But I think the "new" program does what the first one did better, and does not do anything worse.

We are talking about a program that still under 1000 lines of code and that's not getting new features every month, or at all anyway, so maintainability does not seem to be a big issue?

I see the reasoning but I don't see any actual practical drawback to have improved the original program directly in this specific case. I don't see any advantages of keeping "yes" dead simple neither. The new version is still pretty much readable and the extra time it takes to read it and modifying without doing mistakes seems worth the advantages.

dncornholio · on July 21, 2022

> And then you have a possibly confusing situation where you have two programs that do essentially the same thing

My point is, I'd never thought of using yes for this purpose. So in this case, you could make a command called 'outputsomethingfast' and you could make a command called yes, that internally calls 'outputsomethingfast --output=yes' or something like that.

To me this is way more logical, and more in line with the Linux philosophy, right?

jraph · on July 21, 2022

I guess. I don't like the name "yes" and I think we would have been better off with a more general name since the command is general, but now it's there, so…

However, this is independent from this optimization, "yes" already had this feature of outputting anything, I think?

But I expect this kind of accident to happen in any working system that has been long enough. This seems unavoidable. So we'd better put up with this kind of mess probably.

remix2000 · on July 21, 2022

Touche, and there is already a program for the exact purpose of generating large amounts of data: jot(1)

https://manpage.me/?q=jot

xphx · on July 21, 2022

Or if homogeneous data is fine, just cat or dd from a source like /dev/zero.

drewzero1 · on July 21, 2022

I think I've seen /dev/random used for generating large amounts of garbage data as well. (Though usually my problem is too much garbage data, not too little.)

dncornholio · on July 21, 2022

And here is the problem. Now we have 2 programs that do essentially the same thing.

jcelerier · on July 21, 2022

i really don't want to have to learn 12342384 programs. it's much less discoverable than having a few programs with a --help (and more generally a tree-based organization of functionality on your computer).

also, if there's a new program, say, "fastrepeat" wouldn't that be a duplication of functionality between "yes" which just outputs "y" and "fastrepeat 'y'" which is, like, even more bloat since now you need both ?

jacquesm · on July 21, 2022

I would much rather have a very easy to remember command that does one thing and one thing only (namely, what it says on the tin) than to have to remember or dig through a whole slew of command line options in order to get 'yes' to become the equivalent to 'no' or 'cat'.

jcelerier · on July 21, 2022

> I would much rather have a very easy to remember command that does one thing and one thing only

well, I definitely don't. I don't want to encumber my mind with a name for every single of the 25000 "one thing" things I have to do.

mavhc · on July 21, 2022

really you want one program that outputs data, and yes is an alias to outputdatafast --data="y"

tlamponi · on July 21, 2022

  > However, I do think that one should not arrive at the belief that this kind of
  > optimisation is warranted everywhere and that code simplicity can also be a goal

Definitively not warranted everywhere, especial if premature and at such detail, but core utils like `cat` and `yes` are IMO prime examples of where such optimizations are warranted:

- They're in use daily on a huge amount of setups, even small benefits add up much more than in some niche tool.

- They got a clear and small feature set that won't change anytime soon, so there won't be much code churn and thus maintenance effort will be relatively low.

  > Similarly, there is the NSFW comparison between Plan 9’s cat and GNU coreutils’
  > cat [3].
  >
  > [3]: http://9front.org/img/longcat.png

IMO that isn't an exactly fair comparison though, as the difference is not only in optimizations but a lot in boilerplate license/copyright comments and in features like line numbering, modes for showing (non-printable) special characters or whitespace and option parsing for said features.

Strip all that out, and you got (eye balled) 1/3 of that, and 200 lines for an extremely fast core util is really not much nor hard to maintain, as it won't get any new features soon anyway.

pixelbeat__ · on July 21, 2022

Right. Faster generally means less CPU used for a particular use. For example I made ls(1) a bit faster recently. Even though not really noticeable per run, given how often it is run I estimated this to save 9 tonnes of carbon emissions per year. https://twitter.com/pixelbeat_/status/1511030095011581953

bipson · on July 21, 2022

This comment rubs me all the wrong ways possible.

Is the posted link supposed to be a reference? That is your own tweet aka self-reference, boasting your own contribution.

In the tweet itself you say "estimate"! How do arrive at such a grandiose estimate?

How do you attribute a saved carbon footprint to an optimization in a command line tool? You cannot even approximate that. I would argue that such tiny optimizations make 0, nil difference in overall energy consumption on my local machine, all my laptops and all the servers in this building.

I'm not saying that we shouldn't run optimized code, but everyone can scream around random numbers.

rakoo · on July 21, 2022

> and in features like line numbering, modes for showing (non-printable) special characters or whitespace and option parsing for said features.

I remember a page somewhere saying plan9 people where very much against making cat be a generic tool for all those use cases, and it should do what it says in the name: concatenate files.

throwawaylinux · on July 21, 2022

I agree with your premise, that additional complexity is not always worthwhile. But I don't think this "classic argument" is very strong. Hardly even an argument at all.

You compare two things and see one is longer and more complex than the other. How much cost is that really? 10x more code sounds bad, but 100 more lines might put it in perspective. And how complex is the code really? And what is the benefit? A common complaint seen on OpenBSD lists is that performance is behind competitors so you take a bunch of those complaints and make an equally sound argument the other way.

I will say that a lot of the tools and libraries and functionality I have seen the hell optimized out of and functionality added to, allows solutions to be put together which would be infeasible or impossible with simple / naive implementations. More layers or custom code or more complexity can be avoided. Let's say a database layer could be avoided if filesystem operations are fast enough. Or a shell script + cmd line tools can be used instead of writing a new program if fork+exec+exit+context switching+pipe IO, and these kind of tools (yes and cat) are fast. If malloc+free are fast then you don't need to write your own caching allocator in front of it. Etc etc. So you might end up with an end-to-end solution that meets your requirements and actually has less code, or at least less bespoke complexity and more that is long maintained and used by many.

bauruine · on July 21, 2022

I think the main reason that GNU tools are overly optimized is copyright. [0] They just try to avoid any copyright claim by the old proprietary UNIX tools like the copyright on an empty file. [1]

[0]: https://www.gnu.org/prep/standards/html_node/Reading-Non_002...

[1]: http://trillian.mit.edu/~jc/humor/ATT_Copyright_true.html

Al-Khwarizmi · on July 21, 2022

Even if those additional MB/s are not critical for any particular application, tools like these are ran a humongous amount of times every day throughout the world, so while I have no data, I suppose the total CPU usage could be enough that global energy/carbon savings from this kind of optimization would be relevant.

citrin_ru · on July 21, 2022

I never heard about anyone using yes to generate large data files/streams. Does anyone actually uses yes in a way that its daily CPU usage will be bigger than a rounding error?

Also very simple FreeBSD implementation [1] is not too slow on my 10-years old notebook:

  > time yes test_string | dd of=/dev/null bs=1M count=65536
  ...
  2646329200 bytes transferred in 1.709196 secs (1548288918 bytes/sec)
  0.023u 0.850s 0:01.71 50.8% 5+166k 0+0io 0pf+0w

Firefox probably have used more CPU time while I was composing this comment - thanks to JS (in other tabs, HN is a rare example of a site which doesn't abuse my CPU). FF is almost always on the 1st line in top.

If you'll check top/powertop and a typical desktop or a server you'll likely find better targets than 'yes' to reduce energy use.

[1] https://github.com/freebsd/freebsd-src/blob/main/usr.bin/yes...

bipson · on July 21, 2022

I don't think it works that way.

While in general efficient programs can save energy, the "saved" MB/s do not necessarily correlate with saved J. There is no direct cost for a instruction, there is a severe overhead from the machine plainly being switched on. And it's not like you will always be able to "use" that saved MB/s for something else.

And you entirely neglect the "cost" of optimizing. Alone the time spent looking at inefficient code probably cost more energy than all the actually saved energy by a single change.

Consider the time someone could have spent on something else, with significantly more impact.

Al-Khwarizmi · on July 21, 2022

I'm not an expert on hardware, but I do know that CPU consumption depends on CPU load. A laptop battery drains faster and fans run faster when the CPU is working than when it is idle. More MB/s mean less seconds and therefore more idle time. So how could running a program that has more instructions and takes longer to do the same work not have a cost?

About the second part of your comment, it's true that optimizing has a cost, and if I were creating a "yes" or "cat" program for personal use it would obviously be pointless to optimize. But if it's a program ran by millions of people often (probably more true of "cat" than "yes"), it's not that obvious to me that millions of little savings cannot offset the time of one person optimizing.

spacemanmatt · on July 21, 2022

So much agreed. The computing operations where saving instructions equals saving energy are really few and far between. Or non-existent.

Brian_K_White · on July 21, 2022

Once the good-enough version has already been written and sitting around for a few decades, you need an excuse not to get around to optimizing it sooner or later.

The maintenance argument isn't a good enough one. It's a factor, just not a strong one.

There is no reason for every tiny bit of something as foundational as the os to just get better and better forever.

The only reason we write things down in the first place, is so that we can do the work once and then refer to it many times without having to re-create it each time. So there is very little argument for keeping a program small and simple like the first version.

Some, but just not much. Because black-boxifying that complexity is what writing (be it a legal document or a program) is for in the first place. Making a more sophisticated better performing version 2 of something is simply using the tools of writing that exist for no other purpose ultimately.

It does go the other way too. Version 3 could be to invest yet more brainpower into figuring out how to get the same performance in fewer operations. And on that day soneone will wonder if it's worth optimizing that when it already works fine and compute resources are infinite. The answer then as ow will be the same "Yes. Of course."

KronisLV · on July 21, 2022

> Similarly, there is the NSFW comparison between Plan 9’s cat and GNU coreutils’ cat [3].

You know, if there was a SFW version, it would be nicely illuminating about the complexity for getting similar stuff done and handling edge cases and whatnot.

That said, I think the Plan 9 version could use a few comments to decrease the cognitive load, since individual bits of code felt more approachable to me in the GNU version.

Though with the code itself being shorter, one liners or even just a few lines at the top of the function definition could be sufficient.

jon_adler · on July 21, 2022

When to optimise [1] and “premature optimisation” has been an ongoing concern in computer science ever since performance constraints and limits were identified (i.e. since forever).

[1]: https://en.wikipedia.org/wiki/Program_optimization

andrepd · on July 21, 2022

Re [3], yeah so you mean a program which is faster, better documented, and has more functionality, has a longer source code? Hm yeah?

> using speed as the only justification for merging a patch is probably not the right thing to do for a bigger project

I don't get this. Speed is important. Energy efficience is important. Have we gotten so used to the bloat that performance and energy saving must be disregarded?

deaddodo · on July 21, 2022

I think if you dig through GNU coreutils or *BSD world (probably FreeBSD for the “newest and fastest”), you’d be surprised at how complex almost any of the binaries are. That’s just the benefit of 40+ years of constant updates and optimizations on feature-complete software.

pie_flavor · on July 21, 2022

PSA: Do not use `yes` on MacOS CI. In those environments you only get one CPU, which `yes` will use at 100% and never release, and your entire workflow will hang. There is such a thing as too much performance, ironically.

LtWorf · on July 21, 2022

Windows 3.1 had the same issue, where there was no preemptive scheduling

hermitdev · on July 21, 2022

Curious: have you tried nicing 'yes' in this situation?

saagarjha · on July 21, 2022

This sounds like a scheduler bug?

LtWorf · on July 21, 2022

Yep most definitely a kernel bug.

tombert · on July 21, 2022

It always amazed me how fast “yes” is. Whenever I need to generate a really big file for whatever reason (e.g. testing multipart file uploads), I will run something like “yes > myfile.txt” for a few seconds.

Similarly, I tried writing a naive version in node.JS a few years ago and it was several orders of magnitude slower than coreutils…, which is usually the case whenever I try and clone one of the “simple” utilities from coreutils.

NoboruWataya · on July 21, 2022

I just tried "yes > myfile.txt" for a few seconds (less than five) and then checked the size. 1.4 billion lines! I knew yes was fast but that's pretty mind-blowing.

tombert · on July 21, 2022

The GNU coreutils never fail to impress me with how insanely fast they go. One of my favorite blog posts of all time [1] really opened my eyes to how insanely optimized Unix really is.

[1] https://adamdrake.com/command-line-tools-can-be-235x-faster-...

giuscri · on July 21, 2022

`cat /dev/urandom > /tmp/myfile` is faster on macos though

nacs · on July 21, 2022

`yes` is far faster on my Linux (2.2GB/s) vs /dev/urandom (0.38GB/s) -- not surprising since the `man` page for urandom says:

"[urandom] is designed for security, not speed, and is poorly suited to generating large amounts of random data"

`cat /dev/zero` is pretty much as fast as `yes` on my system however (2.1GB/s)

tombert · on July 21, 2022

I suspect that just doing a `truncate` command is probably even faster. I just like the `yes` version because it requires basically zero thought on my end.

MattPalmer1086 · on July 21, 2022

Always love reading about performance enhancements to code. This is a great example of how much you can do with such a simple thing.

I can't see that any of this optimisation is actually useful in the real world. Does yes actually need such a high throughput? Would anything suffer if it didn't? Probably not. Still fun though.

iforgotpassword · on July 21, 2022

Maybe some day you want to automate installing a program that needs about three billion confirmations a second regarding various things...

MattPalmer1086 · on July 21, 2022

Let's hope not, but good to know yes has me covered :)

amelius · on July 21, 2022

It's cheaper than a lawyer, that's for sure.

eru · on July 21, 2022

Compare https://codegolf.stackexchange.com/questions/215216/high-thr...

> *By far the best score so far is by @ais523 - generating FizzBuzz at a throughput that seems to average somewhere around 54-56GiB/s.

The best 'yes' in the linked article gets about 3GiB/s. To be fair: the optimized fizzbuzz is insane.

anderskaseorg · on July 21, 2022

There’s a fastest yes competition on CGCC as well: https://codegolf.stackexchange.com/questions/199528/fastest-...

My entry outputs 28 TB/s. (I…may have done some creative interpretation of the rules.)

WelcomeShorty · on July 21, 2022

A solution that requires the rules to be amended is obviously very creative!

Impressive out of the box thinking.

ninkendo · on July 21, 2022

Why would you want buffer in the yes command though? The point is to only provide output if stdout is being read (ie. stdout is available for writing.) That way you only provide as many lines of output as is needed.

Isn’t filling a buffer going to be a huge waste of CPU for the common case where the command only needs to provide one or two “yes”es? (Common case as in, its intended use of working around scripts that interactively prompt you to continue, etc…)

dolmen · on July 22, 2022

1. Performance of yes is not a concern for that "common case" (well, is that case still common? who writes only interactive program that require yes/no anwers from the terminal?).

2. Filling a buffer has much less CPU (power) cost as it doesn't involve context switching like I/O.

WhyNotHugo · on July 21, 2022

On my Asahi-Linux system, the rust version runs at about 15GB/s, but the gnu version runs as "just" 10GB/s.

However, the author had the opposite results. I wonder if the architecture makes a difference.

tymscar · on July 21, 2022

Could you try running it on macOS on the same machine? I wonder if that would be any different.

Damogran6 · on July 21, 2022

It’s also a great way to synthetically max out a core. Want to check thermals? Got 8 cores? Spin up 8 yes > /dev/null

And let them rip.

jeffbee · on July 21, 2022

For amusement launch it with `yes | head -n 8 | xargs -n 1 -P 0 yes`

This is how I thermally stress CPUs but I usually do `yes speed | head -n 8 | xargs -n 1 -P 0 openssl` instead.

joveian · on July 22, 2022

Convenient for slight stress testing, but if you want maximum thermal generation there is no substitute for FIRESTARTER:

https://github.com/tud-zih-energy/FIRESTARTER

With my system hooked up to a watts up (but including the monitor and a couple of other things) yes > /dev/null on each core gets a bit above 42w, openssl speed on each core occasionally gets above 45w and running FIRESTARTER for a bit gets above 57w (on an i5-6260U (NUC) with hyperthreading disabled, 15W TDP with some attempted power restraint in the BIOS, firefox on decent websites like HN tend to use about 31w and I think that is something like 10-15w (I think 12w but it has been a bit) measuring the computer only).

FIRESTARTER has some evolutionary algorithms as well, but at least on my CPU after hours they were still doing much worse than the default. I was wondering why there wasn't much discussion of BIOS power settings that I could find and after some testing found out that they are not effective at restraining max power use (I forget if they had any effect on typical power use either but I don't think it was much if they did). Also, the integrated GPU can use more power than the CPU and can't be limited. For that GpuTest is handy (but not open source):

https://www.geeks3d.com/gputest/

For me on the internal GPU, Pixmark Piano and Furmark use the most power, either can get 60-61w and adding FIRESTARTER in the background only adds another w or two.

Similarly, checking temps with turbostat (PkgTmp), 2x yes seems to max out about 70, testing one of the higher power openssl tests on each core reaches 75, FIRESTARTER alone quickly reaches 80 and slowly ramps up to 90, and adding gputest got up to 96. Interstingly, the max temperature is reached when the CPU dethrottles too quickly after the gputest is done. It takes a few gputests alone in a row to get into the 80s (I got bored after 2x each alternating Furmark and Piano that hit 83). Similarly, looking at power useage with turbostat the max PkgWatt with yes is 8.26, with openssl speed ecdsa 9.88, with FIRSTARTER alone 17.62, + gputest 19.97 (or gputest alone, seems unreliable).

Anyway, this is a long diversion to say that even fancy yes or openssl speed tests are not that great as CPU stress tests :).

jepler · on July 21, 2022

This 'yes' implementation in Python also gets to the GB/s range on Linux, approximately on par with gnu yes: https://gist.github.com/jepler/5e46f4e46542367d75cde9d77d586...

It takes the useful insights from this article and packages them back into a Python program. But there's so little actual Python code running that the efficiency relative to a compiled language is swamped by all the data copying to the kernel.

vmsplice()ing is probably the next step in speed, however, vmsplice apparently doesn't fit well into the semantics of python (or probably rust) programs. trying to reduce the number of python bytecodes by calling `os.writev(1, many_bufs)` actually harmed performance, not sure why.

2OEH8eoCRo0 · on July 21, 2022

Nice writeup but I have a question: Why does `yes` need to be fast?

reportgunner · on July 21, 2022

When all you have is a hammer, everything looks like a nail.

jedberg · on July 21, 2022

Back in the 90s I worked in a shared linux office environment. Sometimes people would leave themselves logged in, so of course it was your duty to mess with them if you found such an egregious security violation.

Once we found one logged in and decided to add 'yes' as the last line of his bash login.

That day we learned, via an angry phone call, that you can't Ctrl^C out of 'yes' via a remote shell. Today after reading this I'm assuming it's because of the way it floods the buffer.

Poor guy had to call the one dude with root access to fix it for him.

js2 · on July 21, 2022

I recall in college we'd `xhost +` and then start running things remotely on their desktop, playing sounds, etc, but it's been a very long time. Switching the keyboard layout to Dvorak is another fun one.

Bancakes · on July 21, 2022

https://news.ycombinator.com/item?id=14542938

grymoire1 · on July 21, 2022

If you ever had to repair a disk using fsck when it had hundreds of errors, you would understand why yes(1) was created.

Anunayj · on July 21, 2022

Wouldn't it be better to use dd with something like /dev/zeros instead?

meatmanek · on July 21, 2022

The idea is that you pipe `yes | fsck` to answer yes to all questions.

spurgu · on July 21, 2022

Recent HN discussion (2 months ago): https://news.ycombinator.com/item?id=31619076

asciii · on July 21, 2022

Great article. I tried the command for a brief second with output to a text file. I cannot believe that on my M1 after 1-2 seconds had 2684272500 lines amounting to 5GB. Yikes

JustinGarrison · on July 21, 2022

I was setting up centralized logging on a bare metal Kubernetes cluster (500 GB HDD IIRC). I ran a deployment that ran the yes command to generate logs at the end of the day.

When I returned to work in the morning, 2 of the nodes were unhealthy because their hard drives were full.

A couple years later I was load testing Datadog logging from a cluster and also used yes (no logs on disk this time).

I used the daily log quota for the entire Datadog account in 30 minutes.

exabrial · on July 21, 2022

Is there a was to remove this off Debian? Caused more problems than and I’ve never used it for what it was designed.

barbs · on July 21, 2022

I suppose you could add an alias that does nothing instead?

mpweiher · on July 21, 2022

6.4 GB/s

   #include <libc.h>

   #define BUFSIZE (512*1024)

   static char buf[BUFSIZE];

   int main(void){ 
        memset(buf,'y',BUFSIZE);
        while (1) {
           write(1,buf,BUFSIZE);
      }
      return 0;
   }

xpmatteo · on July 21, 2022

The simplest Unix command is arguably "true"

gspr · on July 21, 2022

… which the article also points out just two sentences in!

f0xJtpvHYTVQ88B · on July 21, 2022

GNU coreutils `true` is 80 lines

https://github.com/coreutils/coreutils/blob/master/src/true....

Busybox `true` is 38 lines

https://github.com/mirror/busybox/blob/master/coreutils/true...

The simplest implementation is 0 bytes

https://twitter.com/rob_pike/status/966896123548872705

a_c · on July 21, 2022

Too lazy to figure out, since python's print is already buffered, is there anything we can do to beef up the python print() throughput?

secondcoming · on July 21, 2022

I’m really surprised that rewriting in Rust didn’t result in optimal performance immediately.

kibwen · on July 21, 2022

Any use of `println` in Rust is going to have its performance dominated by repeatedly locking stdout and flushing the buffer every time. It's a bit of an obscure fact (since most programs aren't bottlenecked on printing), but `println` in Rust is kind of just supposed to be a convenience for hello world and print-debugging. Rust provides `writeln` for serious use, which is more verbose but allows you more fine-grained control over locking, buffering, and error-handling. See also https://nnethercote.github.io/perf-book/io.html

einpoklum · on July 21, 2022

birdyrooster · on July 21, 2022

Well that was a bunch of nonsense, cute I guess.