Knowledge Graph and Semantic Search

**Referenced Files in This Document** - [[neo4j_client.py]](file/bi-chat/bi-chat/src/db/neo4j-client.py) - [[graph_schema.py]](file/bi-chat/bi-chat/src/graph/graph-schema.py) - [[populate_graph.py]](file/bi-chat/bi-chat/src/ontology/populate-graph.py) - [[retriever.py]](file/bi-chat/bi-chat/src/ontology/retriever.py) - [[vector_db.py]](file/bi-chat/bi-chat/src/core/vector-db.py) - [[models.py]](file/bi-chat/bi-chat/src/db/models.py) - [[config.py]](file/bi-chat/bi-chat/src/core/config.py) - [[knowledge_agent.py]](file/bi-chat/bi-chat/src/agents/knowledge-agent.py) - [[knowledge.py]](file/bi-chat/bi-chat/src/tools/knowledge.py) - [[knowledge_search_tool.py]](file/bi-chat/bi-chat/src/tools/public-apis/knowledge-search-tool.py)

Introduction
Project Structure
Core Components
Architecture Overview
Detailed Component Analysis
Dependency Analysis
Performance Considerations
Troubleshooting Guide
Conclusion
Appendices

Introduction

This document explains the knowledge graph and semantic search capabilities implemented in the bi-chat module. It covers Neo4j integration for graph storage and traversal, Milvus-based vector search for entity and metric retrieval, and the hybrid knowledge retrieval pipeline that combines structured graph data with unstructured text. It also documents the graph schema for business metrics and entities, the ontology management system, and practical guidance for performance optimization and troubleshooting.

Project Structure

The knowledge graph and semantic search features are primarily implemented under the bi-chat Python package. Key areas include:

Graph database client and schema definitions for Neo4j
Ontology population and retrieval logic
Vector database integration with Milvus
Agent and tool integrations for knowledge queries
Configuration for external systems (Neo4j, Milvus, PostgreSQL, LLM providers)

Diagram sources

Section sources

Core Components

Neo4j client: Provides connection management, query execution, write operations, transactions, health checks, and a global lazy-initialized client.
Graph schema: Defines node types and relationship types used across the knowledge graph.
Ontology retriever: Orchestrates semantic search via Milvus and graph expansion via Neo4j to produce contextual answers.
Milvus helper: Manages collections for indicators and entities, handles indexing and vector search.
Data models: SQLAlchemy models for indicators and entities used alongside graph and vector stores.
Configuration: Centralized settings for Neo4j, Milvus, PostgreSQL, Redis, LLM provider, and embedding model/dimension.
Tools and agents: Public API tool for external knowledge and internal tools for entity/indicator retrieval.

Section sources

Architecture Overview

The system integrates three pillars:

Structured graph (Neo4j): Stores business domains, entities, metrics, and their relationships.
Semantic vectors (Milvus): Indexes indicators and entities for similarity search.
Hybrid retrieval: Uses vector search to retrieve candidates, then enriches with graph traversal and schema context.

Diagram sources

Detailed Component Analysis

Neo4j Integration

Client lifecycle: Lazy initialization, context manager support, health checks, and transactional writes.
Query patterns: Read queries return records as dictionaries; write operations use explicit sessions and transactions; graph expansion uses Cypher with variable-length relationships.
Connection management: Reads from centralized settings; supports environment overrides.

Diagram sources

[neo4j_client.py]

Section sources

Graph Schema Design

Node types: BusinessDomain, SubDomain, BusinessEntity, Metric, Tool, FixedSQLTool, PhysicalTable, Column.
Relationship types: HAS_SUBDOMAIN, HAS_ENTITY, HAS_METRIC, HAS_TOOL, IMPLEMENTED_BY, RELATED_TO, HAS_COLUMN.
These types guide graph population and traversal for business-aware expansion.

Diagram sources

[graph_schema.py]

Section sources

[graph_schema.py]

Ontology Population and Management

SQL parsing: Extracts table definitions, comments, and column metadata from DDL.
Graph synchronization: Creates or updates table entities and links them to business entities based on naming heuristics.
Persistence: Uses Neo4j MERGE semantics to avoid duplicates and updates timestamps.

Diagram sources

[populate_graph.py]

Section sources

[populate_graph.py]

Semantic Search with Milvus

Collections: Separate collections for indicators and entities; consistent strong consistency level.
Indexing: IVF_FLAT index with configurable nlist; metric type L2.
Search: Loads collection, executes vector search with nprobe, and returns hits with entity identifiers.

Diagram sources

[vector_db.py]

Section sources

Knowledge Retrieval Pipeline

Embedding generation: Async OpenAI-compatible client used to embed queries.
Vector recall: Milvus search returns candidate names for indicators and entities.
Graph expansion: Cypher traversal expands seeds into business domain context, sibling entities, and related physical tables.
Context assembly: Formats indicator definitions and entity schemas; aggregates domain context and relationships.
Agent response: KnowledgeAgent composes structured answers using retrieved context.

Diagram sources

Section sources

External Knowledge Augmentation

Public API tool: Wikipedia search with caching and fallback between Chinese and English.
Use case: Enrich answers with authoritative external knowledge when domain-specific graph coverage is insufficient.

Diagram sources

[knowledge_search_tool.py]

Section sources

[knowledge_search_tool.py]

Dependency Analysis

Configuration-driven: All clients read from centralized settings, enabling environment-specific overrides.
Client coupling: OntologyRetriever depends on MilvusHelper, Neo4jClient, and SQLAlchemy models.
Cohesion: Each component encapsulates a single responsibility—Neo4j for graph, Milvus for vectors, retriever for orchestration.

Diagram sources

Section sources

Performance Considerations

Milvus index tuning: Adjust nlist and metric type to balance recall and latency; monitor nprobe impact on search speed.
Embedding model and dimension: Ensure embedding dimensions match collection schema to avoid mismatches.
Graph traversal limits: Cypher queries include LIMIT clauses to cap result sets; tune for domain size and performance.
Connection pooling and reuse: Neo4j driver manages connections; avoid frequent reconnects by reusing the global client.
Asynchronous embedding: Use async embedding calls to overlap I/O with computation.
Caching: Leverage built-in caching for external knowledge retrieval to reduce repeated network calls.

[No sources needed since this section provides general guidance]

Troubleshooting Guide

Neo4j connectivity: Verify URI, credentials, and database name; use health checks to confirm connectivity.
Milvus dimension mismatch: If collection dimension differs from configured embedding dimension, queries may fail; migrate schema instead of dropping data.
Empty results: For entity search, ensure vector embeddings exist and Milvus is loaded; for graph expansion, confirm seed entities exist and relationships are populated.
Configuration issues: Confirm environment variables for Neo4j, Milvus, and embedding model/dimensions are set correctly.

Section sources

Conclusion

The bi-chat knowledge graph and semantic search system combines a structured Neo4j graph with Milvus vector embeddings to deliver precise, context-rich answers. The hybrid approach leverages vector recall for broad relevance and graph expansion for domain-aware enrichment. With modular components, centralized configuration, and robust error handling, the system supports scalable, real-time knowledge retrieval across business metrics and entities.

[No sources needed since this section summarizes without analyzing specific files]

Appendices

Example Queries and Patterns

Business domain-centric expansion: Seed with a business entity; expand to sibling entities within the same domain and related physical tables.
Metric definition retrieval: Use indicator embeddings to find semantically similar metrics; enrich with definitions and formulas.
Hybrid search: Combine vector candidates with graph-derived relationships and schema context for comprehensive answers.

[No sources needed since this section provides general guidance]

brownfield

Multi Agent System Architecture

Data Flow And Processing Architecture

Microservices Architecture And Design Patterns

Core Services

Bi Analysis Analytics Engine

Bi Basic Foundation Data Services

Bi Server Business Orchestration Center

Management Services

Bi Sys System Management Service

Bi Tenant Multi Tenant Management Service

Shared Infrastructure

External Data Integration

Jushuitan Erp Integration

Leke Erp Integration

Kafka Data Synchronization Pipeline

Starrocks Olap Database

Admin Panel Ui Web Admin

Tenant Console Ui Web

Knowledge Graph and Semantic Search

Table of Contents

Introduction

Project Structure

Core Components

Architecture Overview

Detailed Component Analysis

Neo4j Integration

Graph Schema Design

Ontology Population and Management

Semantic Search with Milvus

Knowledge Retrieval Pipeline

External Knowledge Augmentation

Dependency Analysis

Performance Considerations

Troubleshooting Guide

Conclusion

Appendices

Example Queries and Patterns

Bi Analysis Analytics Engine

Bi Basic Foundation Data Services

Bi Server Business Orchestration Center

Bi Sys System Management Service

Bi Tenant Multi Tenant Management Service

Jushuitan Erp Integration

Leke Erp Integration

Knowledge Graph and Semantic Search ​

Table of Contents ​

Introduction ​

Project Structure ​

Core Components ​

Architecture Overview ​

Detailed Component Analysis ​

Neo4j Integration ​

Graph Schema Design ​

Ontology Population and Management ​

Semantic Search with Milvus ​

Knowledge Retrieval Pipeline ​

External Knowledge Augmentation ​

Dependency Analysis ​

Performance Considerations ​

Troubleshooting Guide ​

Conclusion ​

Appendices ​

Example Queries and Patterns ​

Knowledge Graph and Semantic Search

Table of Contents

Introduction

Project Structure

Core Components

Architecture Overview

Detailed Component Analysis

Neo4j Integration

Graph Schema Design

Ontology Population and Management

Semantic Search with Milvus

Knowledge Retrieval Pipeline

External Knowledge Augmentation

Dependency Analysis

Performance Considerations

Troubleshooting Guide

Conclusion

Appendices

Example Queries and Patterns