Fragments

    What is the difference between using the Rerank model and using the ordinary model for Rerank?

    Members only · Non-members can read 30% of the article.

    Published
    May 8, 2026
    Reading Time
    2 min read
    Author
    Felix
    Access
    Members only
    Preview only

    Non-members can read 30% of the article.

    Taking the Google ecosystem as an example, we compare the differences between Vertex AI Ranking API (latest version 004) and Gemini 3.1 series (currently Google's strongest general model).

    Here’s an in-depth comparison based on the latest benchmark data in 2026:

    1. Comparison of model performance and core indicators

    Dimensions

    Vertex Ranking API (004)

    Gemini 3.1 Pro / Flash

    Conclusion

    Underlying Architecture

    Cross-Encoder (Bidirectional Attention Cross)

    Decoder-Only (autoregressive generation)

    004 Winning principle

    Hit rate (nDCG@10)

    Higher (about +15% - 20%). Specifically fine-tuned for Query-Doc correlation and extremely sensitive to semantic alignment.

    Slightly lower. Although it has strong understanding, it is easily affected by Prompt noise and document position (Lost in the Middle).

    004 wins

    First token delay (TTFT)

    ~100ms (reflow 100 documents)

    ~800ms - 2s (affected by long context warmup)

    004 Crush

    Maximum context

    1024 Tokens (Single Record)

    1M - 2M Tokens (Full injection)

    Gemini wins

    Logical reasoning ability

    Zero. Only correlation can be judged.

    Extremely strong. Can understand "implicit logic" (e.g. find documents that support point A but argue against evidence for point B).

    Gemini wins

    Members only

    Subscribe to unlock the full article

    Support the writing, unlock every paragraph, and receive future updates instantly.

    Comments

    Join the conversation

    0 comments
    Sign in to comment

    No comments yet. Be the first to add one.