What is the difference between using the Rerank model and using the ordinary model for Rerank?

Taking the Google ecosystem as an example, we compare the differences between Vertex AI Ranking API (latest version 004) and Gemini 3.1 series (currently Google's strongest general model).

Here’s an in-depth comparison based on the latest benchmark data in 2026:

1. Comparison of model performance and core indicators

Dimensions	Vertex Ranking API (004)	Gemini 3.1 Pro / Flash	Conclusion
Underlying Architecture	Cross-Encoder (Bidirectional Attention Cross)	Decoder-Only (autoregressive generation)	004 Winning principle
Hit rate (nDCG@10)	Higher (about +15% - 20%). Specifically fine-tuned for Query-Doc correlation and extremely sensitive to semantic alignment.	Slightly lower. Although it has strong understanding, it is easily affected by Prompt noise and document position (Lost in the Middle).	004 wins
First token delay (TTFT)	~100ms (reflow 100 documents)	~800ms - 2s (affected by long context warmup)	004 Crush
Maximum context	1024 Tokens (Single Record)	1M - 2M Tokens (Full injection)	Gemini wins
Logical reasoning ability	Zero. Only correlation can be judged.	Extremely strong. Can understand "implicit logic" (e.g. find documents that support point A but argue against evidence for point B).	Gemini wins

1. Comparison of model performance and core indicators

Subscribe to unlock the full article