I was rather curious to see how this handled Garden Path Sentences[0]. For "The old man the boat.", Stanza interprets "man" as a noun rather than a verb. Similarly, for "The complex houses married and single soldiers and their families." "houses" is also interpreted as a noun rather than a verb. These sentences are mostly corner-cases, but was an interesting little experiment nonetheless.
Most humans struggle when reading garden-path sentences, so I would be quite impressed if an NLP toolkit handled them easily out-of-the-box.
EDIT: On a related note, when I was an undergrad there was a group on campus that was doing research on how humans repair garden-path sentences when their first reading is incorrect. They were measuring ERPs to see if something akin to a backtracking algorithm was used + eye-tracking to see which word/words triggered the repair. I graduated before the work was complete, but I might go digging for it to see if it was ever published.
I think that's too high a bar. I didn't interpret either one of those sentences the first time I read it either. It would be obtuse to expect even a "human-level" AI to get these right. Though you could fix it to get it right by backtracking to see if there are alternate solutions that generate complete parses.
But still, this kind of analysis (part-of-speech, dependency parsing, etc.) was deemed useless with NN transformer models.
The solutions to these low-level problems seem to be unimportant for high level tasks. Not to mention that error propagates. Error on part-of-speech tagging will propagate to dependency parsing that uses that info, and eventually this error will affect NER or entity/relationship extraction and similar.
[0] https://en.wikipedia.org/wiki/Garden-path_sentence