New opportunities brought by GEO

Why are you talking about this all of a sudden?

Because I have been paying attention to different fields and opportunities in the entire AI market recently, I have also seen and talked about some very interesting opportunities: such as AI + traditional (medical/finance/), AI + figurine toys, AI Reddit, GEO optimization, etc.

In addition, there are many Buider in the group, so I will share it with you and give you some inspiration. I actually think the most interesting track is the AI + toy figure track, but this track is not suitable for big guys to play, followed by the GEO track.

It just so happens that the GEO track is actually a less popular AI track in my opinion. Entering at this stage can gain some good market share.

First of all, from a macro market perspective: the size of the entire market depends entirely on the size of large model manufacturers. GEO's market ceiling depends on a more upstream question: **How many "answers" are generated by large models every day? **The user scale of GPT is hundreds of millions of monthly active users (MAU).

The number of potential GEO decisions per day ≈ the number of daily calls to generative search/Q&A. For example, the number of calls to generative Q&A: 1 billion times/day, with an average of citations/dependencies per answer, 3-10 "candidate information blocks", then there are billions of implicit decisions about "whether the content is adopted" taking place every day.

But in fact, we can’t calculate it that way. We mainly have to talk about the dependent groups and scenarios, KA (big customers), SMB (small businesses), and creators.

* KA (Key Accounts): In fact, what you are buying is the certainty and risk control of "cannot be absent from AI answers". It is not buying traffic, but mainly for brand promotion. The budget is large and decision-making is slow, but once signed, the stickiness is strong and the probability of renewal is high.

* SMB (Small and Medium Enterprises): What you buy is the opportunity to be "recommended/mentioned in long-tail scenarios", which is more tool-oriented and requires monthly subscription. The volume is large but more price-sensitive, so the delivery must be automated.

* Creators: What you buy is the voice and credit of being "cited/regarded as a source", and more are lightweight tools or low-priced subscriptions. The crowd is large and the spread is strong, but the ARPU is low, making it suitable for ecological entrance.

Well, in fact, from my point of view, there are many vacancies under this large ecological niche. Here are some examples:

* GEO Visibility Detection Tool: Automatically test whether and how a site/brand is mentioned in different models and different questions.

* Model reference monitoring and alerting: Similar to Search Console, but the object is the source and recommended results in LLM answers.

* GEO Content Structure Analyzer: Analyze the page structure, information density, and presentation method to determine the "probability of being adopted by the model."

* Long Tail Problem/Prompt Scenario Miner: Automatically generate real decision-making problems that the model will encounter.

* Cross-model comparison system: Answer differences and citation preferences for the same question in GPT / Claude / Gemini.

* GEO A/B testing framework: There are different versions of the content, which version is more likely to be selected by the model.

* Industry Knowledge Benchmark Set (Benchmark): Construct a "model reference answer source" for a certain vertical field.

* GEO Automated Suggestion Engine: Convert monitoring results into executable content/structure modification suggestions.

It doesn’t take long to develop some small things. You can probably make one in 1 or 2 days.

Introduction

I have been paying attention to GEO (Generative Engine Optimization) for a long time (students who don’t understand this concept should try GPT).

The earliest trigger point comes from Cloudflare's documentation: they introduced files such as llms.txt and prompt.txt in the relevant instructions of Workers AI, which look like "sitemap for the model". But after some research at the time, we quickly discovered a problem: mainstream model manufacturers (OpenAI, Google, etc.) do not follow these protocols. So for a long time, my understanding was this: model selection answers mainly come from high rankings in search engines. GEO may be useful, but it is not very useful. It mainly relies on the SEO of the web page itself.

In addition, because this was something difficult to verify and attribute, further research was abandoned at that time. But in the past few days, due to some opportunities, my understanding of GEO has taken a step further.

The new understanding is: **GEO has a space for engineering intervention that is independent of traditional SEO, which can increase the probability of content being "used" in generative engines. **

Note the keyword: used, not seen.

1. Why do people think GEO is useless?

Because the brain model of most people is like this:

Rank high → Easier to be retrieved → enter context → Participate in the answer → quoted

This link is completely self-consistent in form.

Especially in products that enable real-time search (Web Search / Bing / Google grounding), "high ranking → easier to enter the context" is indeed true.

So we will naturally come to a seemingly reasonable conclusion:

Isn’t GEO just an adjunct to SEO? If you can’t even enter the entrance, what is the generation stage?

**The problem with this section of reasoning is actually to regard locally established conditional propositions as unconditional universal laws. **

2. Conditions for the establishment of inference

The reason why the above reasoning seems to be valid is that it implies a set of strong premises:

* User issues are a hot word
* The scale of candidate content is huge (tens of thousands or more)
* The retrieval phase must rely heavily on "authoritative sorting" to complete dimensionality reduction

In other words, what this set of logic describes is actually the typical problem distribution in the search engine era.

But the problem is: the question distribution in the large model question and answer has obviously deviated from this assumption. **

3. Changes in the distribution of user questions

If we review real large model usage scenarios, we will find that a large number of problems are closer to the following categories:

* Specific scenario decision questions


> I am currently in situation A and have option B. Is it worth it in the long run?

* Boundary condition troubleshooting questions


> Why is X bad under Y conditions? Is it a problem with Z?

* Incremental Value Assessment Question


> I already have Y, will there be duplication of construction if I do Z again?

These problems have some common characteristics: specific, small, and compressed. Once the problem is specified, the number of candidate content will quickly collapse. **

4. Candidate set size and ranking signal changes

When the candidate content collapses from "tens of thousands" to "tens or even single digits", The dominant signal of the sorting mechanism will undergo qualitative changes:

* Authority is no longer a decisive advantage

* Domain name weight is no longer a core variable

*Historical ranking is no longer a requirement

At this point, the real concern of the model becomes:

**Which piece of content can directly cover the structure of the current problem? **

That is to say:

* Whether to explicitly hit the user's constraints

* Whether the same or highly similar question language is used

* Whether the answer form can be directly reused

**At this stage, language itself begins to determine the winner, not "who is bigger", but "who is more like the answer". **

5. Differences between action stages and optimization goals

This is why GEO is not equivalent to an SEO adjunct.

The two solve problems at different stages:

* SEO solves:


> How to be selected by the search system among large-scale candidates

* GEO solves:


> How to be sampled, spliced and referenced by the model in the candidate, and improve the visibility rate

When the candidate set is highly sparse, the importance of the latter will quickly exceed that of the former. **

Look at it from this perspective:

* SEO improves the probability of entering the candidate pool

* GEO improves the probability of being actually used by the generated system_

The two are not substitutes, but work at different levels.

6. Multi-channel nature of information sources

Another problem: **Web Search is not the only entry point for models to obtain materials. **

Many people’s default imaginary model is:

Model → Search → Web → Answer

But in a real system, material may come from multiple paths:

* Real-time Web Search

* Long tail memory in training corpus

* Abstract summary of high-frequency problems

* Structured results returned by tool calls

* Constraints provided implicitly in the user context

This means:

**"Not being searched" is not equivalent to "There is no chance to participate in the generation". **

Especially in long-tail, professional, and structured problems, whether the language is reusable is often more critical than whether it ranks high. **

7. Conditional probability of content

Once the content enters the context, the model is not faced with:

* Site weight

* URL authority

* Ranking signal

The model faces:

A sequence of tokens as a condition for P(next_token | context).

In other words: There is no action of "referring to the web page" during the generation phase, only the action of "going down a certain probability path". **

This sentence is very important because it directly determines what GEO’s “intervention objects” are:

* Not a site

* Not a link

* not authoritative

* Rather, whether the language fragment is "effort-saving/stable/low-risk" on the current generation path

8. How does a generative engine "select materials"?

We can think of generation as a very simple thing:

The model is asking at every step: **What should I say next that is least likely to be wrong? **

“Not prone to error” here refers to stability in a probabilistic sense:

* Can this continuation be consistent with the previous text?

* Are the boundaries of this concept clear and will not lead to ambiguity?

* Is this expression reusable and does not require reorganization?

* Is this argument "closed" and does not force the model to fill in a lot of unknown details?

**GEO = Increase the probability of certain expressions being selected in the generation path. **

9. GEO intervention points

So what can we do?

A. Divisibility of concept boundaries

What the model fears most is fuzzy concepts, because fuzzy concepts will explode subsequent paths.

What we want to do is:

* Define the object first (what this paragraph discusses and what it is not)

* Give boundaries again (which conditions are true and which conditions are not true)

* Finally, give the minimum decision rule (a repeatable judgment)

In fact, this is reducing conditional entropy.

B. Structured "Shortest Closed Loop Explanation"

Models prefer structures that can be closed quickly:

* Phenomenon → Cause → Verification method → Repair strategy

* Condition → Conclusion → Exception → Counterexample

Why? Because this type of structure allows less "fill-in" when the model is output, the risk is lower.

C. Provide "decision sentences that can be copied directly" at the bifurcation point

The so-called citation rate, in many cases, is not that the model "respects the source", but that the model:

Find a sentence that can be used directly as a conclusion.

What we provide should be:

* A sentence that can be directly used as an answer

* Can be established without additional explanation

* And not easily refuted

D. Cover the "uncertain interval" instead of covering the "common sense area"

The content that is most likely to be "swallowed up by the model and not quoted" is common sense.

What really makes the model dependent on us is:

* Gray areas where models are prone to ambiguity

* Common but not clearly written trade-off

* Boundary conditions that seem simple but can easily lead to pitfalls

**So we have to write the part that the model doesn't know how to express reliably. **

Summary

The reason why we talk about GEO systematically this time is not because it is a "new term", but because after the generative engine became the main answer entrance, the mechanism by which content is used has undergone structural changes.

From a market perspective, the value of GEO does not lie in the number of calls, but in how many people rely on large models in real decision-making: KA focuses on brands and risk control that "cannot be absent from AI answers", SMB pursues recommendation opportunities in long-tail scenarios, and creators hope to obtain continuous citations and voice in the generative engine. These three groups of people determine that GEO is a long-term market with tiered charging capabilities.

From an ecological perspective, compared to the mature SEO tool system, GEO is still in the obvious early stage: Whether it is visibility detection, citation monitoring, content structure analysis, or cross-model comparison and A/B testing, there are a large number of gaps that can be directly filled by developers, and many tool forms can be prototyped in 1–2 days.

From a technical perspective, GEO is not competing for the same problem as SEO. SEO solves "how to enter the candidate pool", GEO solves the problem of "how to actually use it in the generation stage after entering the candidate pool."

When user questions are highly specific and the candidate set collapses rapidly, the model no longer relies on domain name authority or historical rankings, but prefers expressions that have a stable language structure, clear boundaries, and can be directly reused. In the generation phase, the model only faces the conditional probability of the token, not the web page or link itself.

Therefore, the intervention points of GEO are also very clear: It's not about "writing like SEO", but providing a low-risk, closeable, and repeatable answer structure in the range where the model is most uncertain and error-prone.

**GEO does not allow us to be seen, but allows us to become the path that the model is more willing to take in the generation path. **

This is why I think that at the current stage, GEO is a track that still has a lot of engineering opportunities and is very suitable for Builders to enter.