Skip to content

StarRocks OLAP Database Architecture

**Referenced Files in This Document** - [[client.go]](file/bi-common/database/starrocks/streamload/client.go) - [[option.go]](file/bi-common/database/starrocks/streamload/option.go) - [[response.go]](file/bi-common/database/starrocks/streamload/response.go) - [[README.md]](file/bi-common/database/starrocks/streamload/readme.md) - [[bi-common.yaml]](file/bi-common/database/starrocks/streamload/nacos/cache/config/bi-common.yamldefault-grouppublic) - [[__init__.py]](file/mcp-server-starrocks/src/mcp-server-starrocks/init-.py) - [[architecture-mcp-server-starrocks.md]](file/mcp-server-starrocks/docs/architecture-mcp-server-starrocks.md) - [[README.md]](file/mcp-server-starrocks/readme.md) - [[protocols-db.md]](file/ui-web/.agent/rules/protocols-db.md) - [[数据库设计.md]](file/ui-web-docs/pages/zh/xcbi-dev/.md) - [[conf.pb.go (bi-basic)]](file/bi-basic/app/service/internal/conf/conf.pb.go) - [[conf.pb.go (bi-notify)]](file/bi-notify/internal/conf/conf.pb.go) - [[setup_test.go]](file/bi-common/database/gormx/setup-test.go) - [[env_test.go]](file/bi-common/database/gormx/env-test.go) - [[config_test.go]](file/bi-common/database/gormx/config-test.go) - [[sync_starrocks_entities.py]](file/bi-chat/scripts/sync-starrocks-entities.py)

Table of Contents

  1. Introduction
  2. Project Structure
  3. Core Components
  4. Architecture Overview
  5. Detailed Component Analysis
  6. Dependency Analysis
  7. Performance Considerations
  8. Troubleshooting Guide
  9. Conclusion
  10. Appendices

Introduction

This document describes the StarRocks OLAP database architecture within the project, focusing on ingestion mechanisms, table design strategies, partitioning and distribution, query optimization, and operational practices. It synthesizes real implementations present in the repository (StreamLoad client, MCP server integration, configuration, and design guidelines) to provide a practical guide for building and operating StarRocks-based analytics systems.

Project Structure

The StarRocks-related components are organized across:

  • StreamLoad client library for real-time ingestion
  • Configuration and environment integration for database connections
  • MCP server for controlled agent access to StarRocks
  • Design guidelines and entity documentation for table modeling and distribution

Diagram sources

Section sources

Core Components

  • StreamLoad client: Provides typed configuration, request construction, and response parsing for HTTP-based ingestion into StarRocks.
  • Load options: Encapsulate ingestion parameters such as format, column mapping, partition targeting, strict mode, timeouts, and JSON handling.
  • Response model: Standardized ingestion outcome with success/failure detection and timing metrics.
  • Configuration: Centralized connection settings and pool tuning for StarRocks.
  • MCP server: Controlled access to StarRocks for AI agents with column-level protections.
  • Design rules: Enforce StarRocks-specific modeling (Primary Key vs Aggregate/Duplicate Key), partitioning, bucketing, and sort keys.

Section sources

Architecture Overview

The ingestion pipeline integrates HTTP StreamLoad with application clients, while the MCP server mediates secure, auditable access to StarRocks for agents. Configuration is shared across services and environments.

Diagram sources

Detailed Component Analysis

StreamLoad Client

The client encapsulates:

  • Configuration: host, port, credentials, timeout, retry, and BE proxy for local development.
  • Request building: supports CSV/JSON, label generation, column mapping, and arbitrary options.
  • Redirect handling: custom HTTP client replaces BE host during 307 redirects when a proxy is configured.
  • Response parsing: standardized ingestion metrics and success/failure checks.

Diagram sources

Section sources

Load Options

Options enable flexible ingestion:

  • Label, columns, separators, row delimiter
  • Max filter ratio and strict mode
  • Partition targeting and timeout
  • JSON outer array stripping and JSON paths

Diagram sources

Section sources

Response Handling

The response model standardizes ingestion outcomes and provides convenience checks for success and failure.

Section sources

Configuration and Environment Integration

Connection settings and pool tuning are centralized and validated across services:

  • YAML-based configuration with driver, host, port, credentials, and pool parameters.
  • Proto-based configuration structures expose StarRocks optimization flags and pool settings.
  • Tests confirm conversion and environment-driven setup.

Diagram sources

Section sources

MCP Server for Controlled Access

The MCP server exposes StarRocks tools to external agents with security and isolation:

  • Tools: list tables, describe table, read query
  • Column-level protection via masking/aggregation rules
  • Auditability through the MCP protocol

Diagram sources

Section sources

Table Modeling and Design Guidelines

The project enforces StarRocks modeling best practices:

  • Primary Key model for transactional metadata requiring upsert/delete
  • Aggregate Key model for pre-aggregated reporting
  • Duplicate Key model for high-volume detail records
  • Explicit partitioning for large tables and colocated joins for frequently joined tables
  • Sort keys and prefix index considerations for filtering and compression

Diagram sources

Section sources

Dependency Analysis

Key dependencies and relationships:

  • StreamLoad client depends on HTTP transport and configuration
  • Response parsing decouples ingestion logic from result interpretation
  • Configuration is consumed by setup routines and validated in tests
  • MCP server depends on connection configuration and design rules for safe access

Diagram sources

Section sources

Performance Considerations

  • Ingestion batching: aim for balanced batch sizes to avoid excessive transactions or memory pressure.
  • Strict mode and filter ratios: tune to balance data quality and throughput.
  • Partitioning and bucket sizing: align with scan volume reduction and parallelism targets.
  • Colocated joins: eliminate network shuffle for frequently joined large tables.
  • Sort keys and prefix index: place selective, shorter columns early to improve filtering and compression.

[No sources needed since this section provides general guidance]

Troubleshooting Guide

Common issues and remedies:

  • Authentication failures: verify credentials and port (FE HTTP port for StreamLoad).
  • Redirect loops or BE address resolution: configure BE proxy for local development.
  • Import failures: inspect response status and message; consult error URL for details.
  • Timeout exceeded: increase timeout for large batches.
  • Duplicate label errors: ensure unique labels per import task.

Section sources

Conclusion

The repository demonstrates a production-ready StarRocks architecture integrating HTTP-based ingestion, robust configuration management, secure agent access, and strong design guidelines. By adhering to the outlined modeling, partitioning, and operational practices, teams can achieve reliable, high-performance analytics at scale.

[No sources needed since this section summarizes without analyzing specific files]

Appendices

Appendix A: StreamLoad End-to-End Flow

Diagram sources

Appendix B: MCP Server Tooling

  • list_tables: enumerate accessible tables
  • describe_table: retrieve schema with metadata
  • read_query: execute controlled SQL queries

Section sources

Appendix C: Entity Synchronization Script

A script exists to synchronize StarRocks entities, supporting schema alignment and operational workflows.

Section sources