
One new thing I’ve started to appreciate about Milvus is how well it supports hybrid search and evolving AI use cases.
In our recent work, we’ve been exploring scenarios where both vector similarity and metadata filtering are required together. Milvus handles this combination quite effectively, which makes testing more realistic. For example, instead of just validating “similar results,” we can now validate “relevant results within a specific context,” which is closer to how real users interact with AI systems.
Another thing I’ve noticed is improved stability when working with larger and more dynamic datasets. As our test data grows and changes frequently, Milvus still maintains consistent performance. This has helped us run more reliable regression tests without worrying about performance drops.
I also like how it fits into modern AI workflows that involve retrieval-based systems like RAG. It gives us a solid foundation to test not just similarity, but also how well retrieval impacts final AI responses.
One subtle but important benefit is how it enables better experimentation. We can quickly try different indexing or query approaches during testing and see how they affect relevance. This makes it easier to fine-tune AI behavior from a QA perspective.
Overall, beyond the core features, Milvus is becoming more useful as we move into more advanced and realistic AI testing scenarios. Review collected by and hosted on G2.com.
There are a few areas where Milvus can improve, especially from a QA and AI testing perspective.
One key improvement would be better guidance around index selection and parameter tuning. Right now, getting the right balance between accuracy and performance often requires trial and error. Having clearer recommendations or built-in suggestions based on use cases would save a lot of time.
Observability is another area that could be stronger. When search results are not as expected, it’s not easy to pinpoint whether the issue is with embeddings, indexing, or query behavior. More detailed logs, debugging tools, or visual insights into how results are retrieved would make troubleshooting much easier.
The UI could also be enhanced. While API-based interaction works well for development, a more interactive interface for managing collections, running queries, and validating results would help a lot in exploratory testing and quicker validation cycles.
From a scaling perspective, simplifying deployment and resource management would be valuable. Running Milvus efficiently in larger environments still requires careful planning, so improvements in auto-scaling or easier configuration would help teams adopt it faster.
Lastly, more practical, real-world examples especially focused on testing, validation, and AI quality use cases would make onboarding smoother for QA teams.
Overall, Milvus is strong in performance and capability, but improving usability, visibility, and guidance would make it even more effective in real-world workflows. Review collected by and hosted on G2.com.



