Did you read the article? Dziri and Peng are not the “skeptical AI community,” they are in fact die hard AI researchers. This is like saying people who run benchmarks to find performance problems in code are skeptics or haters.
I read the article: it does not look like very good research: It's simple to find flaws in LLMs reasoning / compositional capabilities looking at problems that are at the limit of what they can do now, or just picking problems that are very far from their computational model, or submitting riddles. But there is no good analysis of the limitations, nor inspection of how/how much better recently LLMs got exactly at this kind of problems. Also the article is full of uninformative and obvious things to show how LLMs fail in stupid tasks such as multiplication between large numbers.
But the most absurd thing is that the paper looks at computational complexity in terms of direct function composition, and there is no reason an LLM should just use this kind of model when emitting many tokens. Note that even when CoT is not explicit, the LLM output that starts to shape the thinking process still makes it able to have technically unbound layers. With CoT this is even more obvious.
Basically there is no bridge between their restricted model and an LLM.
Just pointing out that what you're paying for is actually 3x these resources. By default you get a primary server and two replicas with whatever specification you choose. This is primarily for data durability, but you can also send queries to your replicas.
For sure, I personally use pgvector myself but I also don't have millions and millions of rows. I haven't messed with anything other than Pinecone so I can't speak to those services, but there's a big difference than a vector db for your own personal use and a chat app/search on a db with millions of users convos and docs, not sure how well these managed vector DB platforms scale, but you probably need the db guy anyways when you're using vectors at scale. Atleast I would.
Actual SPANN or janky "inspired by SPANN" IVF with HNSW in front? Only real SPANN (with SPTAG, and partitioning designed to work with SPTAG) delivers good results. A superficial read of the paper LOOKS like you can achieve similar results by throwing off the shelf components at it, but it doesn't actually work well.
As I explain in the post, if you have multiple connections and consistent write load, the timeout will penalize older quereres and noticeably harm your long tail latency
I thought it was a pretty good list of common Rails-application-specific and sqlite3-specific knobs to turn, for newcomers to performance tuning. (Really just a guided tour though -- turn this knob to enable this particular tool for dealing with concurrency problems...)
Yeah calling the EC2 API is definitely more complex than leasing datacenter space, purchasing racks of hardware, deploying a fault tolerant and secure network, capturing and managing offsite backups, dealing with hardware component failures, etc.
I was under the impression we were talking about ICs here -- your link shows that an upper-middle IC that you'd expect to be choosing between early startup or FAANG will see something around 450 a year, which tracks much closer to what I'd expect.
That's how risk works. The FAANG employee friends have exactly a 0% chance at a 9-figure outcome. They'll easily be at top decile if they pull a nominal $10M+ post-tax in 20 years.
The fact that New York (a city of 8M people) has only received ~150% more funding than Mountain View (a city of 81K people) tells you all you need to know about the promise here.
Mountain View is part of a contiguous suburb stretching between South San Francisco, San Francisco, and San Jose which have massive biotech, tech, and manufacturing industries, respectively.
reply