I agree. However, I think what they are saying is the embedding should just be like any other index. I mean, yeah it should be but that isn't reality. There are massive latencies involved as well as costs.
Perhaps in ~10 years embedding / chunking approaches will be so mature that there will just be one way to do it and will take no more time than updating a btree but that certainly isn't the case now.
I think the right abstraction for today would be for OpenAI to manage the vector search. It is kind of weird to send all of the data to a service only to have it compute a vector and hand it back to me. I have to figure out how to chunk it etc (I'm sure they would do a better job than I would). I should just have to deal with text ideally. Someone else can figure out how to return the best results.
> I think the right abstraction for today would be for OpenAI to manage the vector search
So I disagree, but they have a very easy-to-use RAG system in beta that does what you want.
In my use cases, fine-grained control over chunking and so on is application-level code. I’m using an LLM to split documents into subdocuments with context (and location) and then searching those subdocuments, while pushing the user to the source
Perhaps in ~10 years embedding / chunking approaches will be so mature that there will just be one way to do it and will take no more time than updating a btree but that certainly isn't the case now.
I think the right abstraction for today would be for OpenAI to manage the vector search. It is kind of weird to send all of the data to a service only to have it compute a vector and hand it back to me. I have to figure out how to chunk it etc (I'm sure they would do a better job than I would). I should just have to deal with text ideally. Someone else can figure out how to return the best results.