1. Support semantic search and keyword querying across PDF, DOC, TXT, CSV, JSON documents for both toB and toC scenarios.
2. Large traffic during peak usage leads to great operation and maintenance pressure, and latency and stability severely degrade user experience.
3. Integrate embedding models for direct vectorization of inserted and queried documents, supporting dictionary trees, inverted indexes, among others.
4. Per-tenant data isolation is needed. With 100,000+ tenants per table, the system needs to support tenant-specific vector queries.