What is the difference between using the Rerank model and using the ordinary model for Rerank?
Members only · Non-members can read 30% of the article.
- Published
- May 8, 2026
- Reading Time
- 2 min read
- Author
- Felix
- Access
- Members only
Non-members can read 30% of the article.
Taking the Google ecosystem as an example, we compare the differences between Vertex AI Ranking API (latest version 004) and Gemini 3.1 series (currently Google's strongest general model).
Here’s an in-depth comparison based on the latest benchmark data in 2026:
1. Comparison of model performance and core indicators
Dimensions | Vertex Ranking API (004) | Gemini 3.1 Pro / Flash | Conclusion |
Underlying Architecture | Cross-Encoder (Bidirectional Attention Cross) | Decoder-Only (autoregressive generation) | 004 Winning principle |
Hit rate (nDCG@10) | Higher (about +15% - 20%). Specifically fine-tuned for Query-Doc correlation and extremely sensitive to semantic alignment. | Slightly lower. Although it has strong understanding, it is easily affected by Prompt noise and document position (Lost in the Middle). | 004 wins |
First token delay (TTFT) | ~100ms (reflow 100 documents) | ~800ms - 2s (affected by long context warmup) | 004 Crush |
Maximum context | 1024 Tokens (Single Record) | 1M - 2M Tokens (Full injection) | Gemini wins |
Logical reasoning ability | Zero. Only correlation can be judged. | Extremely strong. Can understand "implicit logic" (e.g. find documents that support point A but argue against evidence for point B). | Gemini wins |
Subscribe to unlock the full article
Support the writing, unlock every paragraph, and receive future updates instantly.
Comments
Join the conversation
No comments yet. Be the first to add one.