Fragments

    Some practices for business file indexing and spatial indexing

    Members only · Non-members can read 30% of the article.

    Published
    May 8, 2026
    Reading Time
    3 min read
    Author
    Felix
    Access
    Members only
    Preview only

    Non-members can read 30% of the article.

    Describe the current implementation of Brand Space cloud space search: first locate relevant folders based on user intent, and then perform retrieval unit-level hybrid retrieval within the folder.

    Core Structure

    The current structure is:

    ``Plain brand_space_folder -> Intent routing and data partitioning

    brand_space_file -> File-level resource metadata, storage information, processing status

    brand_space_retrieval_unit -> The smallest unit of evidence that actually participates in the search

    
      
    
    In other words, the system does not directly search for files globally, but:
    
      
    
    ``Plain
    user intent
    
      -> Find the corresponding first-level folder
    
      -> Search for evidence fragments in this folder and subfolders
    
      -> Aggregate back to file
    

    Folder first

    The first-level folder is the first-level semantic boundary of Brand Space, for example:

    ``Plain Brand Voice Product Facts Visual Identity Messaging Boundaries Campaign Assets

    
      
    
    The problems it solves are:
    
      
    
    ``Plain
    Which data area should I look for this intention in?
    

    searchBrandKeyword will first read the user's Brand Space folder and do folder routing based on the folder's name, path, description, and agentInstructions.

    If a first-level folder is hit, the search scope will be limited to that folder and its subfolders. This can avoid a global keyword or semantic recall from pulling in files from irrelevant data areas.

    If the folder cannot be routed explicitly, it falls back to the global retrieval unit search to ensure recall.

    File table

    brand_space_file saves resource-level information:

    ``Plain ID userId folderId path name fileType mimeType storageUrl size content processingStatus createdAt updatedAt

    
      
    
    Its responsibilities are:
    
      
    
    ``Plain
    Manage resources
    Show resources
    Record processing status
    Save original or extracted evidence content
    

    It is not the primary retrieval unit and does not hold file-level AI compact metadata.

    Search unit

    What really enters the search is brand_space_retrieval_unit:

    ``Plain fileId unitIndex unitType titleText evidenceText aliases keywords keyFacts restrictions searchVector embedding pageNumber timeStartMs timeEndMs

    
      
    
    `evidenceText` is searchable, rerankable, and citable evidence text from sources including:
    
      
    
    ``Plain
    Original text
    PDF/DOC/DOCX parse text
    Image OCR and short visual description
    Video transcript, visible text and short scene description
    

    AI compact extraction is only responsible for extracting structured index fields:

    Members only

    Subscribe to unlock the full article

    Support the writing, unlock every paragraph, and receive future updates instantly.

    Comments

    Join the conversation

    0 comments
    Sign in to comment

    No comments yet. Be the first to add one.