Database Schema Overview
Table of Contents
- Introduction
- Project Structure
- Core Components
- Architecture Overview
- Detailed Component Analysis
- Dependency Analysis
- Performance Considerations
- Troubleshooting Guide
- Conclusion
Introduction
This document provides a comprehensive database schema overview for the StarRocks OLAP database supporting the bi-analysis BI system. It documents all nine tables, their purposes, relationships, multi-tenant design, partitioning strategies, and operational patterns optimized for analytical workloads and BI queries.
Project Structure
The database schema is defined under the bi-analysis module with two primary artifacts:
- Schema definition script containing all nine tables and indexes
- Initialization script with seed data for field groups and system-level metrics
- Design documentation detailing the conceptual model and tenant-aware patterns
Diagram sources
Section sources
Core Components
This section introduces the nine tables that form the analytical metadata and configuration layer for the BI system. Each table serves a distinct role in defining data sources, dimensions, metrics, templates, and user personalization.
- anls_table: Defines BI data source tables and their relationships to dimensions and date types.
- anls_dimension: Configures analytical dimensions and supports mutual exclusivity among dimensions.
- anls_field: Stores tenant-aware metric definitions, formulas, and filters.
- anls_field_group: Categorizes metrics into logical groups and controls template creation permissions.
- anls_date: Provides a complete date dimension table for time-based analysis.
- anls_template: Saves user-defined analysis templates with row/column fields and configurations.
- anls_field_meta: Captures metadata for data source fields enabling dynamic metric creation and validation.
- anls_user_home_layout: Stores user-specific home dashboard layouts per device type.
- anls_field_visible_template: Maintains per-user visibility and ordering preferences for templates.
Section sources
Architecture Overview
The schema follows a StarRocks OLAP design optimized for analytical queries with:
- Primary Key tables with BITMAP indexes on status and soft-delete fields
- Hash distribution on surrogate keys for balanced data locality
- JSON fields for flexible expressions and filter configurations
- Tenant-aware fields to isolate multi-tenant data and configurations
Diagram sources
Detailed Component Analysis
anls_table: Data Source Configuration
Purpose:
- Defines BI data source tables and associates them with dimensions and date types.
- Supports tenant scoping for multi-tenancy.
Key attributes:
- Primary key: id
- Tenant scoping: tenant_id
- Unique constraint: c_table_name
- Status and soft delete: status, deleted_at
- Audit fields: created_by, updated_by, timestamps
Indexing and distribution:
- Hash distribution on id
- BITMAP index on status
- BITMAP index on deleted_at
- Unique index on c_table_name
Relationships:
- Associates with anls_field via c_table_name for metric-to-source linkage.
Section sources
anls_dimension: Dimension Configuration and Mutual Exclusivity
Purpose:
- Defines analytical dimensions (e.g., time, product, shop, geography).
- Implements mutual exclusivity among dimensions using c_mutex_keys.
Key attributes:
- Primary key: id
- Unique constraint: c_key
- Mutual exclusion: c_mutex_keys (comma-separated dimension keys)
- Sorting and status: i_sort, status
- Soft delete: deleted_at
- Audit fields: created_by, updated_by, timestamps
Indexing and distribution:
- Hash distribution on id
- BITMAP index on status
- BITMAP index on deleted_at
- Unique index on c_key
Frontend usage:
- Supports dynamic dimension selection with mutual exclusivity enforcement.
Section sources
anls_field: Metric Definitions and Tenant Isolation
Purpose:
- Stores tenant-aware metric definitions, formulas, and filters.
- Supports system-level metrics (tenant_id = 0) shared across tenants.
Key attributes:
- Primary key: id
- Tenant scoping: tenant_id (0 = system-wide)
- Unique constraint: c_field
- Metric metadata: c_display, c_group, c_type, c_unit
- Expression and filtering: c_expression (JSON), c_where (JSON)
- Date range controls: i_start_diff_day, i_end_diff_day, t_start_date, t_end_date
- Template flag: is_template
- Sorting and status: i_sort, status
- Soft delete: deleted_at
- Audit fields: created_by, updated_by, timestamps
Indexing and distribution:
- Hash distribution on id
- BITMAP indexes on tenant_id, c_group, c_type, status, deleted_at
- Unique index on c_field
Relationships:
- References anls_field_meta for field validation and metadata.
- References anls_table for data source linkage.
Section sources
anls_field_group: Metric Grouping and Template Controls
Purpose:
- Groups metrics into logical categories (e.g., traffic, finance, SKU).
- Controls whether tenants can create template-based metrics.
Key attributes:
- Primary key: id
- Unique constraint: c_key
- Name and sort: c_name, i_sort
- Template permission: allow_template
- Status and soft delete: status, deleted_at
- Audit fields: created_by, updated_by, timestamps
Indexing and distribution:
- Hash distribution on id
- BITMAP indexes on status, deleted_at
Section sources
anls_date: Complete Date Dimension
Purpose:
- Provides a comprehensive date dimension table for time-based analysis.
- Range: 2020-01-01 to 2050-12-31.
Key attributes:
- Primary key: t_date
- ISO week, month, quarter, year, day-of-week, day-of-month, week-of-year
- Week boundaries and display formatting
Distribution and initialization:
- Hash distribution on t_date
- Pre-populated with generated sequences for full coverage.
Section sources
anls_template: User-Defined Analysis Templates
Purpose:
- Saves user-defined analysis templates with row/column fields and configurations.
- Supports tenant scoping.
Key attributes:
- Primary key: id
- Tenant scoping: tenant_id
- Owner identification: oper_id
- Template metadata: c_name, c_icon, c_color, c_row, c_col, c_field, c_field_config, c_remark, i_sort
- Status and soft delete: status, deleted_at
- Audit fields: created_by, updated_by, timestamps
Indexing and distribution:
- Hash distribution on id
- BITMAP indexes on tenant_id, status, deleted_at
Section sources
anls_field_meta: Dynamic Field Metadata and Validation
Purpose:
- Captures metadata for data source fields enabling dynamic metric creation and validation.
- Uses c_group to map to dynamic base tables (e.g., order/refund/sd/apportion/bill_period).
Key attributes:
- Primary key: id
- Field identity: c_table_field, c_display
- Source grouping: c_group
- Data typing: c_data_type, c_date_type (JSON), c_func
- Metric/filter capability: is_metric, is_filter
- Related filters and operators: c_related_filters (JSON), c_operators (JSON)
- Enumerations: c_enum_values (JSON)
- Sorting and status: i_sort, status
- Soft delete: deleted_at
- Audit fields: timestamps
Indexing and distribution:
- Hash distribution on id
- BITMAP indexes on is_metric, is_filter, status, deleted_at
- Unique composite index on (c_group, c_table_field)
Relationships:
- Validates fields referenced by anls_field.
- Supports dynamic metric creation and filter rendering.
Section sources
anls_user_home_layout: Personalized Dashboard Layouts
Purpose:
- Stores user-specific home dashboard layouts per device type.
- Supports tenant scoping.
Key attributes:
- Primary key: id
- Tenant and user identifiers: tenant_id, user_id
- Device type: c_type (default: pc)
- Layout configuration: c_layout_data (JSON)
- Audit fields: created_by, updated_by, timestamps
- Soft delete: deleted_at
Indexing and distribution:
- Hash distribution on id
- BITMAP indexes on tenant_id, user_id, c_type, deleted_at
Section sources
anls_field_visible_template: Per-User Template Visibility
Purpose:
- Maintains per-user visibility, column ordering, and summary preferences for templates.
- Mirrors external system compatibility (FieldVisibleTemplate).
Key attributes:
- Primary key: id
- Tenant scoping: tenant_id
- User/operator identifiers: i_user_id, i_oper_id
- Template key: c_table_key (e.g., templateTable/618)
- Configuration name/value: c_name, c_value (JSON)
- Audit fields: created_by, updated_by, timestamps
- Soft delete: deleted_at
Indexing and distribution:
- Hash distribution on id
- BITMAP indexes on tenant_id, i_user_id, c_table_key, deleted_at
Section sources
Dependency Analysis
The following diagram illustrates the relationships among the nine tables and how they collaborate to support the BI configuration and analysis workflow.
Diagram sources
Section sources
Performance Considerations
- Distribution strategy:
- Hash distribution on primary keys ensures balanced data locality and efficient joins.
- Bucket counts are tuned per table to balance cardinality and concurrency.
- Indexing strategy:
- BITMAP indexes on status and deleted_at enable fast filtering for active records and soft-deleted isolation.
- Unique indexes prevent duplicates and accelerate lookups for identifiers.
- Storage properties:
- Replication set to 1 for development/stage; production may require higher replication for durability.
- Analytical workload optimization:
- JSON fields (c_expression, c_where, c_date_type, c_operators, c_enum_values) support flexible configurations while maintaining query performance through targeted filtering and indexing.
- Date dimension table (anls_date) precomputes derived attributes to reduce runtime computation in analytical queries.
[No sources needed since this section provides general guidance]
Troubleshooting Guide
Common operational issues and resolutions:
Duplicate metric creation:
- Symptom: Insert fails due to unique constraint on c_field.
- Resolution: Ensure c_field uniqueness within tenant scope; system-level metrics use tenant_id = 0.
Soft deletion and visibility:
- Symptom: Records appear missing in queries.
- Resolution: Filter by deleted_at = 0 or use views that exclude deleted records.
Tenant isolation failures:
- Symptom: Metrics/templates visible across tenants.
- Resolution: Verify tenant_id filtering in queries and application logic; ensure BITMAP index on tenant_id is leveraged.
Dimension mutual exclusivity:
- Symptom: Conflicting selections persist.
- Resolution: Validate c_mutex_keys configuration and enforce at the application level during selection.
Date dimension gaps:
- Symptom: Missing dates in analysis.
- Resolution: Confirm anls_date initialization covers the required date range.
Section sources
Conclusion
The bi-analysis StarRocks schema provides a robust foundation for multi-tenant BI analytics. It balances flexibility with performance through tenant-aware fields, BITMAP indexing, hash distribution, and JSON-based extensibility. The nine-table design cleanly separates concerns across data sources, dimensions, metrics, templates, metadata, and user personalization, enabling scalable and maintainable analytical workflows.