as far as I can tell Chroma can only store chunks, not the original documents. This is from your docs `If the documents are too large to embed using the chosen embedding function, an exception will be raised`.
In addition it seems that embeddings happen at ingest time. So, if, for example, the OpenAI endpoint is down the insert will fail. That, in turn means your users need to use a retry mechanism and a queuing system. All the complexity we describe in our blog.
Obviously, I am not an expert in Chroma. So apologies in advance if I got anything wrong. Just trying to get to the heart of the differences between the two systems.
Chroma certainly doesn't have the most advanced API in this area, but you can for sure store chunks or documents, its up to you. If your document size is too large to generate embeddings in a single forward pass, then yes you do need to chunk in that scenario.
Oftentimes though, even if the document does fit, you choose to chunk anyways or further transform the data with abstractive/extractive summarization techniques to improve your search dynamics. This is why I'm not sure the complexity noted in the article is relevant in anything beyond a "naive RAG" stack. How its stored or linked is an issue to some degree, but the greater more complex smell is in what happens before you even get to that point of inserting the data.
For more production-grade RAG, just blindly inserting embeddings wholesale for full documents is rarely going to get you great results (this varies a lot between document sizes and domains). So as a result, you're almost always going to be doing ahead-of-time chunking (or summarization/NER/etc) not because you have to due to document size, but because your search performance demands it. Frequently this involves more than one embeddings model for capturing different semantics or supporting different tasks, not to mention reranking after the initial sweep.
That's the complexity that I think is worth tackling in a paid product offering, but the current state of the module described in the article isn't really competitive with the rest of the field in that respect IMHO.
(Post co author) We absolutely agree that chunking is critical for good RAG. What I think you missed in our post is that the vectorizer allows you to configure a chunking strategy of your choice. So you store the full doc but then the system well chunk and embed it for you. We don’t blindly embed the full document.
I didn't miss that detail, I just don't think chunking alone is where the complexity lies and that the pgai feature set isn't really differentiated at all from other offerings in that context. My commentary about full documents was responding directly to your comment here in this thread more so than I was the article (you claimed chroma can only insert chunks, which isn't accurate, and I expanded from there).
Yes that is correct, but my position (which perhaps has been poorly-articulated) is that in the non-trivial instances, it is a distinction without difference in the greater context of the RAG stack and related pipelines.
Just allowing for a chunking function to be defined which is called at insertion time doesn't really alleviate the major pain points inherent to the process. Its a minor convenience, but in fact, as pointed out elsewhere in this thread by others, its a convenience you can afford to yourself in a handful of lines of code that you only ever have to write once.
In addition it seems that embeddings happen at ingest time. So, if, for example, the OpenAI endpoint is down the insert will fail. That, in turn means your users need to use a retry mechanism and a queuing system. All the complexity we describe in our blog.
Obviously, I am not an expert in Chroma. So apologies in advance if I got anything wrong. Just trying to get to the heart of the differences between the two systems.