Some practices for business file indexing and spatial indexing
Members only · Non-members can read 30% of the article.
- Published
- May 8, 2026
- Reading Time
- 3 min read
- Author
- Felix
- Access
- Members only
Non-members can read 30% of the article.
Describe the current implementation of Brand Space cloud space search: first locate relevant folders based on user intent, and then perform retrieval unit-level hybrid retrieval within the folder.
Core Structure
The current structure is:
``Plain brand_space_folder -> Intent routing and data partitioning
brand_space_file -> File-level resource metadata, storage information, processing status
brand_space_retrieval_unit -> The smallest unit of evidence that actually participates in the search
In other words, the system does not directly search for files globally, but:
``Plain
user intent
-> Find the corresponding first-level folder
-> Search for evidence fragments in this folder and subfolders
-> Aggregate back to file
Folder first
The first-level folder is the first-level semantic boundary of Brand Space, for example:
``Plain Brand Voice Product Facts Visual Identity Messaging Boundaries Campaign Assets
The problems it solves are:
``Plain
Which data area should I look for this intention in?
searchBrandKeyword will first read the user's Brand Space folder and do folder routing based on the folder's name, path, description, and agentInstructions.
If a first-level folder is hit, the search scope will be limited to that folder and its subfolders. This can avoid a global keyword or semantic recall from pulling in files from irrelevant data areas.
If the folder cannot be routed explicitly, it falls back to the global retrieval unit search to ensure recall.
File table
brand_space_file saves resource-level information:
``Plain ID userId folderId path name fileType mimeType storageUrl size content processingStatus createdAt updatedAt
Its responsibilities are:
``Plain
Manage resources
Show resources
Record processing status
Save original or extracted evidence content
It is not the primary retrieval unit and does not hold file-level AI compact metadata.
Search unit
What really enters the search is brand_space_retrieval_unit:
``Plain fileId unitIndex unitType titleText evidenceText aliases keywords keyFacts restrictions searchVector embedding pageNumber timeStartMs timeEndMs
`evidenceText` is searchable, rerankable, and citable evidence text from sources including:
``Plain
Original text
PDF/DOC/DOCX parse text
Image OCR and short visual description
Video transcript, visible text and short scene description
AI compact extraction is only responsible for extracting structured index fields:
Subscribe to unlock the full article
Support the writing, unlock every paragraph, and receive future updates instantly.
Comments
Join the conversation
No comments yet. Be the first to add one.