I think this makes it sound a lot more difficult than it has to be, with the for...

jrop · 2024-11-22T14:13:03 1732284783

I love writing parsers like this. Add in Pratt Parsing for operator precedence and writing parsers can be really easy.

detourdog · 2024-11-22T15:04:39 1732287879

I got the impression the author was trying to add higher level reasoning to the chosen term for string to AST parsing.

I felt that they were pointing out how the cognitive load of understanding is effected by word choice.

joz1-k · 2024-11-22T21:15:02 1732310102

I had exactly the same feeling as you after reading the article. And interestingly, all production parsers for all major languages are hand-written recursive descent parsers. On the other hand, if you inspect the actual code for these production parsers (even for newer languages like Swift, Scala, Kotlin, or Rust), the complexity and amount of code is still quite staggering.

taeric · 2024-11-22T22:09:51 1732313391

Isn't a large portion of the code to get friendlier messages for the user?

joz1-k · 2024-11-23T07:41:05 1732347665

Yes, that explains a lot of the complexity. Another reason is type checking/inferring and making the AST detailed enough for code analysis tools, such as code hinting in IDEs.

antononcube · 2024-11-22T15:06:14 1732287974

It seems you are describing how functional parsers (aka parser combinators) work.

(BTW, there is a "Parser combinators" section in the featured post/article.)

ckok · 2024-11-23T06:30:30 1732343430

I believe the proper term for what i am describing is a recursive descent parser. With which it is also quite doable to generate proper error handling and even recovery. Some form of this is used in almost every production language I think.

It has been years since I've written a proper parser but before that every time I had to write one I tried the latest and greatest first. ANTLR, coco/r, combinators. All the generated ones seemed to have a fatal flaw that hand writing didnt have. For example good error handling seemed almost impossible, very slow due to Infinite look ahead or they were almost impossible to debug to find an error in the input schema.

In the end hand crafting seems to be faster and simpler. Ymmv.

My point about the article was mostly that all the formal theory is nice but all it does is scare away people, while parsing is probably the simplest thing about writing a compiler.

marcosdumay · 2024-11-22T18:09:36 1732298976

The bad news about those is that it's easy to mindlessly create a parser that runs on exponential time.

The good news is that this happens in the grammar definition. So once you define your language well, you don't have to watch for it anymore.

antononcube · 2024-11-22T19:36:01 1732304161

Insightful!

Do you know of any "large scale" research on this? I.e. analysis of multiple related projects and/or of "real life stories."

(I agree regardless.)

marcosdumay · 2024-11-22T22:02:09 1732312929

I don't know about any real-world study. But there are people complaining about it from time to time, and it's quite obvious from the theory.

DemocracyFTW2 · 2024-11-22T15:48:06 1732290486

IMHO it gets even better when you can use regular expressions and write a 'modal' parser where each mode is responsible for a certain sub-grammar, like string literals. JavaScript added the sticky flag (y) to make this even simpler.

rurban · 2024-11-24T14:47:45 1732459665

It gets much worse. It's a huge anti-pattern to use regex within parsers.

DemocracyFTW2 · 2024-11-25T07:41:42 1732520502

why so?

rurban · 2024-11-25T08:36:21 1732523781

There are many explanations. The most famous one Rob Pike in his lexer talk 2011 https://youtu.be/HxaD_trXwRE?si=Q1B4mZ4Vo1Z2gRZq at 10.48

Or http://www.golangdevops.com/2019/03/07/halfpike-a-framework-...

Or several articles why you should not parse with regex, like https://stackoverflow.com/questions/1732348/regex-match-open...

DemocracyFTW2 · 2024-11-25T09:13:55 1732526035

I couldn't locate the part where Pike addresses regexes in his 50-minute talk.

The second piece seems to be about someone complaining about a dysfunctional and untidy software situation where incompetence led to the incorrect application of greedy regexes, producing wrong results.

The third one is the most famous rant against attempts to parse a language with symmetric bracing (start tags that must match end tags) with a single regex from a language that doesn't provide regexes with symmetric bracing support, that is of course doomed to fail.

None of these provide any argument against lexing with sticky regexes. For one thing, the rant against regexes being unable to match bracing elements is only valid for regex engines that don't provide extensions for brace matching, but many languages and extensions do (e.g. https://stackoverflow.com/a/15303160/7568091).

However this point is typically irrelevant because this is not about parsing, it's about lexing, but I realize this my fault because in the above I wrote write a 'modal' parser where I should've written write a 'modal' lexer.

In lexing you typically do not match braces, you just realize you've found a brace and emit an appropriate token. It's up to the downstream processing to see whether barce tokens are matching.