This entire article seems to amount to "I am used to C, thus I don't see the problem with C." which is a fine position to have, but it's a perspective unique to the writer and others like him. It doesn't apply to people learning new languages that aren't C.
Also several statements about how C must be used because somehow it's closer to the real world/hardware than other languages. Which is easily shown to be false given that hardware designers have had bend backwards and into contorted shapes to emulate the hardware environment that C was originally created to work against. This great article is a nice rebuttal of that: C Is Not a Low-level Language: Your computer is not a fast PDP-11. https://queue.acm.org/detail.cfm?id=3212479
These types of arguments feel like they come from people who don't realize how much the compiler reworks your code to make it act like it does what you told it to do.
> Which is easily shown to be false given that hardware designers have had bend backwards and into contorted shapes to emulate the hardware environment that C was originally created to work against.
It may be a valid point, but value-less, since this basically implies that most hardware you are going to be able to buy today (ARM, x86, whatever) is going to be a fast PDP-11 (or at least it's going to present itself as one).
Not quite. The modern hardware isn't actually a fast PDP-11 it's just capable of emulating one more efficiently than you would if people didn't insist on writing C.
So it may be that your "clever" C algorithm which in your head translates into just six CPU operations, unfortunately on a real modern CPU is six hefty macro-ops that will take dozens of cycles to execute and repeatedly go to sleep waiting for main memory, whereas the algorithm in a modern language that looked ludicrous to your C programmer eyes compiles to sixteen tiny ops the CPU can consume two at a time with no waits and it's done in eight cycles while the C code was still waiting for a main memory read.
> The modern hardware isn't actually a fast PDP-11 it's just capable of emulating one more efficiently than you would if people didn't insist on writing C.
I'm not disagreeing with that. It's a valid remark. What I'm arguing is that it's a useless remark in practice.
> So it may be that your "clever" C algorithm which in your head translates into just six CPU operations, unfortunately on a real modern CPU is six hefty macro-ops that will take dozens of cycles to execute
Note that this was already true even for the original PDP-11, and more so for the original 8086, both being ridiculously CISC and microprogrammed. C was born in this context.
> whereas the algorithm in a modern language that looked ludicrous to your C programmer eyes compiles to sixteen tiny ops the CPU
Even if true, it's irrelevant, since even a "modern language" will be forced to target that machine with its PDP-11 -emulating ISA, and is therefore subject to the same limitations than "modern C" is. The very point the article is making leads to the conclusion that most machines today are actually presenting themselves as a PDP-11-like, and therefore any language which does not target a PDP-11-like is doomed to a niche.
The C programmers do distort what makes sense, but they can't overcome practical realities.
For example have you noticed your CPU is multi-core? The PDP-11 wasn't multi-core, and C doesn't really do well for writing concurrent software. Practically though, despite Amdahl's law it makes sense to provide a multi-core CPU. Most C software won't be able to take advantage, but some of your software is written by people who either will put the extra effort in despite C or use a different language where it wasn't so hard.
Take another example, the 0-terminated string. This terrible C design is more or less omnipresent. And sure enough CPUs have features to enable this terrible data structure, because it's so omnipresent. But those features don't make a better string worse they just pull the C string closer to parity than it might be otherwise. Rust's str::len() is still much faster than C's strlen() even if your CPU vendor focuses entirely on the C market.
The one example I like is shared memory multi-processing (what you mean when you say "multi-core", because multi-core systems are older than C), because I do see it as a hack done to preserve compatibility with a more traditional programming style (or dare I say, with a more traditional "programmer"). As I said, I agree with this, but the argument is still value-less. If you don't do multiprocessing this way you'll find yourself handicapped because _all_ current machines are heavily optimized for it. There's very little point to a language that doesn't do it this way, because it would not be able to run as well on current machines!
The other two examples are highly questionable. 0-terminated strings are not an artifact of the computer architecture, but a conscious trade-off, and in fact other languages already used Pascal strings (even on the PDP-11). Same as if you want your lists with a constant time size function or not. Sure your size() is now "much faster", but something else now becomes slower. In fact, you could argue x86 bends to supports _both_ string implementations, since the rep prefix does support (and update) a count.
And the final example, "isdigit" is arguably not even a language issue, and definitely not a legacy of the PDP-11 nor x86 architecture. I'm rather sure no one in his right mind would even consider wasting 0.25KiB just for such function, not until well into the 90s at least. These types of code-bloat optimizations are more of the 2000s and even 2010s "performance" craze. You yourself even say on the other thread that glibc is doing this to facilitate locales, and not for performance reasons.
What are you imagining alternatives would look like that are prohibited because of what you're calling a traditional "programmer" ?
There are a bunch of other things computers could do that would in principle go real fast but humans can't successfully reason about them. It seems like a stretch to blame C. I reckon that even the most precocious humans have experience with causality a long time before they write any C and it's this experience which makes it hard to handle a world which cannot be understood in terms of cause and effect for example.
NULL-terminated strings aren't bad because strlen() is slow. The performance tradeoffs are fine enough. They're bad because they have bad security properties.
> So it may be that your "clever" C algorithm which in your head translates into just six CPU operations, unfortunately on a real modern CPU is six hefty macro-ops that will take dozens of cycles to execute and repeatedly go to sleep waiting for main memory, whereas the algorithm in a modern language that looked ludicrous to your C programmer eyes compiles to sixteen tiny ops the CPU can consume two at a time with no waits and it's done in eight cycles while the C code was still waiting for a main memory read.
Rust's u8::is_ascii_hexdigit is a predicate which decides whether the 8-bit unsigned integer is the ASCII code for a hexadecimal digit, that is 0 through 9, A through F or a through f.
Inside the implementation is exactly what a modern programmer would expect, a pattern match, it'll do a whole bunch of arithmetic operations on the 8-bit unsigned integer and get a true or false result. Your modern CPU can do that real fast, because it's all just arithmetic on registers.
If you go look at a C library isxdigit() function, sometimes (not so often these days) you will find it has a mask and a LUT, it's assuming that "just" looking up the answer in memory will be faster. Your modern CPU can't do that quickly. It must calculate the index into the LUT, and then read the appropriate memory, hopefully that's in L1 cache or something, if it needs to go to main memory that takes an eternity, and either way it means something else can't be in the cache because this LUT is taking up space.
GNU libc is an example of a relatively well known isxdigit() which uses a LUT even today.
Its excuse is that this simplifies locale handling, you know for if your locale has hexadecimal digits in a different place from everybody else, presumably because you're from some alternate universe where Z is a hex digit.
But isn't that just a bad implementation in libc, rather than a problem with C itself? It should be trivial to rewrite isxdigit() copying rust's algorithm in C.
If you wanted a language that's close to the actual hardware you have, you'd have a different language for different hardware. (That's what assembly is)
Meanwhile, C is simple and easy to write compilers for. Therefore, it gained popularity because of its portability
If you know how a CPU works, especially the bits outside decode and ALUs like cache coherency, I don't think you could call assembly "close to the hardware" either.
Our computers would be a lot slower if they were just a fast PDP-11. The way computers worked changed because the physical reality of transistors don't like to work in that manner.
No, the problem is not that modern CPUs want to be a PDP-11, it's that they want to be an 8086 for compatibility reasons. From the software's perspective, the ISA *is* the hardware. It doesn't matter if the CPU is internally recompiling everything to a RISC-like architecture with an arbitrary number of registers because there's nothing i can do to access that.
If there's ever a radical new ISA which supplants x86 where C's constructs prove to be a bottleneck, and that ottleneck only applies to C, you will finally see it fall to the wayside.
>These types of arguments feel like they come from people who don't realize how much the compiler reworks your code to make it act like it does what you told it to do.
I guess it's hard to argue against a vague undefined term like "rework" but i can assure you that the output of gcc largely resembles what i expect it to. If the compiler does something bizarre i will know about it because i actually do check the assembly.
> From the software's perspective, the ISA is the hardware
Precisely! And because the weirdness happens below the ISA, all the mismatches Dave Chisnall describes apply to any other language just as much including actual machine language. So raw machine language is not a low-level language? <head scratch>.
Considering we're still trapped by the Gentle Tyranny of Call/Return, we'd probably choose the same interfaces...never mind the huge path-dependence on optimising all the parts around those interfaces.
I have a sneaking suspicion that a more dataflow-oriented interface might be useful. The CPUs do dataflow analysis, the compiler does dataflow analysis, and higher levels are also often dataflow, but all communicate using a non-dataflow instruction stream. On the other hand, the only commercial dataflow CPU I am aware of, the NEC 7281 (http://www.merlintec.com/download/nec7281.pdf) was less than successful.
> It'd be interesting to see what interface we'd choose today with multiple decades of hardware and software development
A worse one? Not a couple days ago there were a couple stories about Itanium here on HN (a designed-from-scratch ISA that turned out to be practically worse than the much more ad-hoc x86_64). Even e.g. RISC-V (which is not entirely free of legacy) is practically 1:1 with ancient ISAs like ARM.
ARM is an evolving ISA. AArch64 is quite different from ARMv7, and has nicer security features than the other leading brand, who can barely manage to ship SGX.
It's not like x86 or RISC-V are not "evolving ISAs" either. And personally I would classify SGX and other "security" features as misfeatures, but that's for another day...
Which in the very initial paragraphs points out it is not about having a hidden set of registers to work with - it's about exploiting parallelism and the tension between a sequential programing language and massively parallel hardware architecture.
The ISA is not the hardware it's a 40 decade old abstraction that the hardware bends itself into to make your code run.
The primary benefit to C is that it is simple. And that is IMO the reason why it has such sticking power. The entire language & toolchain is understandable at a fairly core level without too much effort.
Please don’t start a C flame war either HN. I know I’m nerd sniping you all on this one
Write a multithreaded program guaranteed to have no data races in C and tell me that C is simple.
C _looks_ simple, deceivingly so. As soon as you need to deal with a large project in a corporate environment on a code base aged in double digit years you don't think it's simple anymore. C is incredibly complex, unmanageably so.
I would be interested to see someone write a program in any language (other than ones specifically designed to combat this, rust…) using threading and guarantee it to be data/race safe.
But, I’m not arguing with you. I honestly think that any corporate codebase after double digit years (and seven figure LoC) turns into a completely unmanageable mess. If it isn’t, it’s because of the team culture and pure rigor. Not the language
> I would be interested to see someone write a program in any language (other than ones specifically designed to combat this, rust…) using threading and guarantee it to be data/race safe.
I don't limit my statement to just C. The topic was C however so I mentioned C.
> I don't limit my statement to just C. The topic was C however so I mentioned C.
Your comment reads like a red herring though. You're not actually trying to argue that C makes thing easier or hard. Your red herring only means that hard problems affecting all languages aren't magically turned into non-issues in C. You can still write a C program that uses threading and is as safe as any other language and end up being a far simpler project, but that means nothing because the subject is still hard. And what would that actually say about C, let alone refute?
I am limiting it to just C though, because I’m responding to an article about the sticking power of C.
I’ve worked in a lot of massive and unmaintainable codebases, and I’ve always had the most fun untangling C ones. I don’t know exactly why, but I suspect it’s because at their core they are always plain ol’ simple C.
>As soon as you need to deal with a large project in a corporate environment on a code base aged in double digit years you don't think it's simple anymore.
I've done this before, never felt like C wasn't simple.
How big was the project and how many people were maintaining it? And were the people maintaining it the same people who started it? Perhaps I should have included a statement about coming in to work on a project that was already double digits old when you started working on it.
i have a C threaded producer that lives in a target process and exports data to a C# consumer in another process. (locks are shared between processes according to data in a predetermined memory segment. also, add a custom high bandwidth IPC protocol to that spec) ...it works beautifully. i did port the C to a mix of unsafe C# + atomic Windows intrinsics, but it's surprisingly robust.
I think your main argument here is that large C code bases can easily become unwieldy, and I agree with you on that. However, simple multithreading in C is simple. It's not a terribly difficult problem to solve.
Nobody mentioned programming. And hard/easy is a relative term. My easy != your easy.
Sharpened rock is a simple tool. Versatile too, but try using it to make a sculpture and it's going to be hard, very hard. Especially if you never used it extensively.
C is a simple language. No doubt about it. But doing multi-threading and memory safety in C is like trying to build a 100m tall structure using nothing but a sharpened rock. Possible but, why would you.
Java is a complex language. Lots of complexity. Classes, annotations, async, System.out and so on. But multi-threaded and memory safe (memory leaks allowed ofc) programs is as simple as just not using Java's array of unsafe tooling.
> Write a multithreaded program guaranteed to have no data races in C and tell me that C is simple.
I could port most of the multiprocessing based python scripts I have lying around to C without a problem. Multithreading can be easy when you have a few rules in place what threads can read/modify data at a specific point in time.
I would argue it's very easy to think one understands it. However I would wager most people who say they understand C don't really know many of the undefined behaviour cases and their implications.
The number of people I have interviewed that have used C and C++ for years and can't tell me what undefined behavior is is too damned high. I've had one or two, and only one of them could give me a coherent answer of what the compiler is allowed to do if it runs across it.
In my experience it's only the language lawyers who actually know what to look out for, and there are damned few of them.
Which is stupid since undefined and unspecified behaviors are hugely important to how C and C++ are used in the field.
> Which is stupid since undefined and unspecified behaviors are hugely important to how C and C++ are used in the field.
Is it really that fundamental if you can setup your project to throw warnings/errors, and even use static code analysis tools to flag those?
I'd argue that it's far worse to work on a project that's not setup to detect those errors than to expect that sort of issue to be handled at the recruiting level.
We’ve been really spoiled by modern compilers and operating systems which let you get away with large swaths of undefined behavior. You typically don’t know what is hidden from you
I would bet that the vast majority of undefined behavior encountered is when interacting with the standard library though. Which isn’t really the languages fault
(But, I also totally concede that a language is inseparable from its std lib in reality)
> Undefined behaviour being weird and having bizarre implications is not a feature of C.
How so? It is behavior left undefined in the C standard.
> It is a feature introduced by compiler writers who prize esoteric optimisations more than simplicity.
It's not the compiler writers who left the behaviour undefined, but the standard writers. If it was so clear what should be done (so all compiler writers should do it in one way) then why did they not define it?
> And it is not forbidden by the standard, though discouraged.
What do you mean it's not forbidden in the standard, it's undefined in the standard hence the name. Because of this you really should not rely on it, because the behavior could change.
Use-after-free or double-free is undefined behavior.
Therefore a C implementation that provides malloc(3) should provide an implementation of free(3) that is a no-op and provide a garbage collector that actually frees the memory.
It's not an argument, it's an observation and statement of fact.
> Use-after-free or double-free is undefined behavior.
Yes.
> Therefore
Why therefore?
> a C implementation that provides malloc(3) should provide an implementation of free(3) that is a no-op and provide a garbage collector that actually frees the memory.
If you think so. It certainly doesn't follow from what I wrote. My point is that if you use a pointer after you freed the memory, you get whatever is in memory that the pointer points at on the machine the code is running on.
That is not a "weird and bizarre" implication, it is a plain and straightforward implication given how C is implemented on real machines. It is (obviously, I hope!) outside of the scope of the C standard to say what will be at that memory location at that point, and therefore it is UB.
And it being UB, "what you get" may also, for example, be a segfault because the library/OS have decided to unmap that memory. And again, that is also not "weird and bizarre", but perfectly straightforward.
> Use-after-free or double-free is undefined behavior.
It's an error even if it's undetectable by the compiler. I understand that nomenclature from few decades ago makes a distinction but nowadays we just call such thing an error.
I love C. It's what I learned after learning MASM style Intel assembly.
I have a project still at the planning stage. I have many Rust crates lined up. So far I really like the bits of Rust I have learned. C plus generics? Sign me up! But damn, after coming from really awesome IDEs like Visual Studio, it just seems like it's taking me forever to make progress.
Right now I'm asking myself, does Rust save me time in the long run from chasing down bugs? I'm not really sure, because I think I'm already decent at avoiding C foot guns.
In general, C is a better option if you find you are escaping to macro defined Assembly or other memory ops (glib back-ported data structures for C highly recommended). Rust likes to keep unsafe operations organized, but is a persistent pain if a problem scope requires many "unsafe" operations.
I'd wager Rust will end up like Boost... really cool.. but too chaotic to trust past 10 months. =)
I'm not sure if I agree with any statement of the form "The use-case for Rust is not the same as foo." for any foo where Rust is replacing a language that replaced foo.
Rust is replacing C++ in several well documented places, however C++ itself was a replacement for C in many places. So by definition the use-case for Rust overlaps the use-case for C.
Arguments like this sound like old arguments that were re-told to me by my father where people argued that C couldn't replace the use case for Algol at the University of Michigan MTS system. (Assuming I'm remembering the second-hand argument right. I'm a second generation software engineer.)
Also, there's nothing of note in the last 10 months about Rust that's different from the 10 months before that.
> Arguments like this sound like old arguments that were re-told to me by my father where people argued that C couldn't replace the use case for Algol at the University of Michigan MTS system.
The problem with those comments is that they are intellectually lazy, and use age as a proxy for "better". Language X can be designed with specific features in mind that offer interesting features that Language Y fails to offer, but that is not ensured by the release date, nor might that be an acceptable tradeoff when considering other factors.
Actually, Cpp originally was a C template pre-compiler if I recall... just like Rust was. Note, people argue for years about various language merits.
"there's nothing of note in the last 10 months"
Indeed, still a pain to build, port, and package on some systems due to the poor project design required dependencies. There are papers pointing out the common fallacy of increased memory safety with the current Rust builds as well.
Of course, Rust may work fine for whatever you are building. =)
I hated learning/using C. Once I was introduced to C++, it was like a breath of fresh air. Sure, at lot of it stylistic in nature. But those frameworks matter. They really do.
That was my path, back in 1991-92, coming from Turbo Pascal, C felt like a step backwards, luckily the teacher that provided us with Turbo C 2.0 for MS-DOS, also had a copy of Turbo C++ 1.0 around, and thus I became the only kid in class that would rather use C++ when given the option.
Pointers and pointer math were a 10 billion dollar mistake.
Unchecked array access is a several billion dollar mistake.
Goto has its place: consolidated resource error unwind cleanup. That's basically its only valid use except for mechanically-generated finite state machines. Beyond that, don't bother. Other programming languages use reference counting and lifetimes to manage resources.
int
foo() {
void *b0 = malloc(1000);
if (!b0) goto err0;
/* do something */
void *b1 = malloc(1000);
if (!b1) goto err1;
/* do something else */
void *b2 = malloc(1000);
if (!b2) goto err1;
/* keep going */
/* ... */
free(b2);
free(b1);
free(b0);
return 0;
err3:
free(b2);
err2:
free(b1);
err1:
free(b0);
err0:
return 1;
}
I don't agree. Null is just a tombstone value. If you're dealing with specifying memory addresses, you either specify the address or point a value that tells you the variable is not pointing to an address.
If anything, conflating arrays with pointers was a bad idea. The language would be far better equiped to deal with memory safety issues if an array data type included info like total size, used size, and perhaps also stride length. Relying on tombstone values to delimit strings or arrays was just prone to blow up on everyone's face.
Only when there isn't a C++ compiler around, even if we restrict ourselves to the common subset, C++ has stronger type safety, while constexpr + templates are way better than macros.
I lol every-time someone quotes their favorite JIT language that is essentially a meta-circular compiler for C library bindings.
“It can scarcely be denied that the supreme goal of all theory is to make the irreducible basic elements as simple and as few as possible without having to surrender the adequate representation of a single datum of experience.” ( Albert Einstein )
The use cases where C is the best option are probably limited to exotic platforms that don't have compilers for better languages. There aren't many targets were you can't at least use C++.
Well I guess it's a good thing no one ever had a double free in C :)
> They mean you can never tell what `delete` does.
Can you elaborate on this? A destructor is a function that's built into an object. It's really not more complicated than that. If malloc/free are functions that should be used in pairs, then the constructor/destructor pair tries to provide a convenient and structured way to know where malloc/free go. You would call new in the constructor and delete in the destructor.
Cases can arise where ownership of objects are not clear (which is a separate issue) but when they do occur, you can have a custom destructor that frees a lot of other objects, and then these other objects may in fact be "owned" elsewhere.
Somewhere down the line these other objects are freed again, causing double free.
I don't know, but when I use C, I like to statically allocate everything if at all possible, I don't free() anything, I let the OS clean everything up at exit(). A lot of the small Unix utilities don't necessarily need dynamic memory allocation IMO
First off, new/delete are only used for objects that are allocated on the heap. Even then, modern C++ has RAII wrappers such as std::unique-ptr. You should only need new/delete in rare cases.
This. Over the last few years of writing C++ on a daily basis, I can count the number of times I used `delete` on both hands (maybe one hand actually). `new` was harder to get rid of because of some flaw in the framework we used, but that has been fixed now. Unless you are in the business of writing smart pointers yourself, you should never have to use those keywords anymore.
Of course it calls the destructor. What else should it do? The point is that you don't get any double frees, because you never have to call delete manually. Or maybe I misunderstood your complaint?
C still fills a niche nothing else really does in terms of being a lingua franca everyone can read that any language can talk to. The language is stable enough that code written today will probably have the same semantics far into the future, something C++ has been a lot more shaky about historically.
Had it not been for UNIX freebie, it would had been a footnote like so many others.
Outside UNIX clones and the few surviving embedded workloads without alternative, there is hardly a reason to reach for it instead of C++, which is also a UNIX child.
Also several statements about how C must be used because somehow it's closer to the real world/hardware than other languages. Which is easily shown to be false given that hardware designers have had bend backwards and into contorted shapes to emulate the hardware environment that C was originally created to work against. This great article is a nice rebuttal of that: C Is Not a Low-level Language: Your computer is not a fast PDP-11. https://queue.acm.org/detail.cfm?id=3212479
These types of arguments feel like they come from people who don't realize how much the compiler reworks your code to make it act like it does what you told it to do.