Conversation Management System
Table of Contents
- Introduction
- Project Structure
- Core Components
- Architecture Overview
- Detailed Component Analysis
- Dependency Analysis
- Performance Considerations
- Troubleshooting Guide
- Conclusion
Introduction
This document describes the conversation management system responsible for maintaining context and state across multi-agent interactions. It covers session lifecycle management (creation, persistence, cleanup), memory store implementation (short-term and long-term), context preservation mechanisms, integration with external storage systems, caching strategies, and error recovery. It also provides examples of conversation state serialization, context extraction algorithms, and conversation flow control.
Project Structure
The conversation management system spans backend Python services and frontend React context/state management:
- Backend: FastAPI routes, asynchronous session service, conversation manager, agent orchestration, memory stores, and caching.
- Frontend: React context and hooks for managing session state and UI interactions.
Diagram sources
- [chat.py]
- [conversation_manager.py]
- [session_service_async.py]
- [models.py]
- [long_term_memory.py]
- [redis_cache.py]
Section sources
- [chat.py]
- [conversation_manager.py]
- [session_service_async.py]
- [models.py]
- [long_term_memory.py]
- [redis_cache.py]
- [ChatContext.tsx]
- [use-chat.ts]
Core Components
- ConversationManager: Central orchestrator for session lifecycle, concurrency control, memory loading/persistence, and long-term memory integration.
- AsyncSessionService: Asynchronous database operations for session creation, updates, retrieval, and message persistence.
- BIReActAgent: Agent wrapper that auto-persists incremental conversation messages and supports long-term memory.
- LongTermMemoryService: User-scoped long-term memory backed by Mem0 and Milvus.
- RedisCacheManager: Cost-saving cache for LLM responses, vector search, and statistical computations.
- FastAPI Routes: Expose session and chat endpoints for frontend integration.
Section sources
- [conversation_manager.py]
- [session_service_async.py]
- [bi_agent.py]
- [long_term_memory.py]
- [redis_cache.py]
- [chat.py]
Architecture Overview
The system integrates frontend session state with backend session management and memory persistence. Sessions are created via API, cached in-memory with LRU eviction, and persisted to PostgreSQL. Long-term memory is enabled per user and stored in Milvus. Redis caches frequently accessed results to reduce latency and cost.
Diagram sources
Detailed Component Analysis
Session Lifecycle Management
- Creation: If no conversation_id is provided, a UUID is generated. Sessions are created/updated asynchronously in PostgreSQL via AsyncSessionService.
- Persistence: Messages are saved incrementally after each agent reply. The agent tracks saved message count to avoid duplication.
- Cleanup: Idle sessions are evicted based on max_idle_seconds; removed sessions are marked archived asynchronously.
- Concurrency: Per-session asyncio locks prevent race conditions during creation and access updates.
Diagram sources
Section sources
Memory Store Implementation
- Short-term memory: InMemoryMemory per session, loaded from PostgreSQL session_messages with intelligent character threshold to avoid triggering compression. History sanitization ensures valid assistant-tool sequences.
- Long-term memory: User-scoped instances via LongTermMemoryService, backed by Mem0 and Milvus. Retrieval includes deduplication to avoid repeated memories.
- Serialization: Messages are stored as JSON with a "messages" array containing role/type/timestamped entries.
Diagram sources
Section sources
Context Preservation Mechanisms
- Intelligent memory loading: Iterates through recent messages in reverse chronological order, estimates cumulative character count using a token counter, and stops before exceeding a configured threshold to avoid compression triggers.
- History sanitization: Ensures assistant tool-call sequences are complete; strips incomplete tool-calls and discards orphan tool messages to prevent downstream errors.
- Context extraction: Provides a formatted history endpoint that flattens and cleans messages for agent consumption, preserving assistant-tool continuity.
Diagram sources
Section sources
Integration with External Storage Systems
- PostgreSQL: Stores sessions and session_messages with foreign key relationships and timestamps.
- Milvus: Vector store for long-term memory embeddings; includes a patched vector store to handle metadata-only updates safely.
- Redis: Caches LLM responses, vector search results, web search results, and computed statistics to reduce latency and cost.
Diagram sources
Section sources
Caching Strategies for Performance Optimization
- LLM cache: Prompts hashed to keys with TTL; reduces repeated LLM calls.
- Vector search cache: Stores retrieval results for queries with shorter TTL.
- Web search cache: Stores web search results with moderate TTL.
- Statistics cache: Stores computed analytics with metadata and recommended refresh time.
Section sources
Conversation State Serialization and Context Extraction
- Serialization: Messages stored as JSON with a "messages" array; includes role, type, content, and optional tool_call_id/timestamps.
- Context extraction: Endpoint returns flattened, cleaned messages suitable for agent prompts, skipping intermediate tool_use/tool_result blocks.
Section sources
Conversation Flow Control
- Streamed responses: SSE generator emits conversation_id once, followed by text deltas, tool_use, and tool_result events, ending with [DONE].
- Non-streamed responses: Collects full response, tool calls, and tool results in a single JSON object.
- Frontend integration: React context manages session ID, messages, and API version; hooks create/delete sessions and load histories.
Diagram sources
Section sources
Error Recovery and Session Timeout Handling
- Idle cleanup: Periodic cleanup removes sessions older than max_idle_seconds; marks status as archived asynchronously.
- Graceful cancellation: Pipeline handles asyncio.CancelledError and interrupts agent execution.
- Frontend fallback: On API errors, frontend displays user-friendly messages and resets streaming state.
Section sources
Dependency Analysis
- ConversationManager depends on AsyncSessionService for DB operations and optionally LongTermMemoryService for persistent user memory.
- BIReActAgent persists messages via AsyncSessionService after each reply.
- Routes depend on ConversationManager and expose session and chat endpoints.
- Frontend depends on API endpoints for session CRUD and chat streaming.
Diagram sources
- [chat.py]
- [sessions.py]
- [conversation_manager.py]
- [session_service_async.py]
- [models.py]
- [long_term_memory.py]
- [redis_cache.py]
Section sources
- [chat.py]
- [sessions.py]
- [conversation_manager.py]
- [session_service_async.py]
- [models.py]
- [long_term_memory.py]
- [redis_cache.py]
Performance Considerations
- Memory compression: Enabled by default with configurable thresholds and recent rounds retention to balance context length and cost.
- Caching: Redis caches frequently accessed results to reduce latency and cost; cache statistics provide visibility into hit rates.
- Asynchronous operations: DB writes and long-term memory operations are offloaded to background tasks to minimize request latency.
- Token counting: Uses a character/token counter aligned with compression logic to avoid unnecessary recompression.
[No sources needed since this section provides general guidance]
Troubleshooting Guide
- Empty or malformed histories: The system sanitizes histories to ensure assistant-tool continuity; incomplete tool-call sequences are stripped and orphan tool messages are dropped.
- Session not found or deleted: Deleting a session soft-deletes it; frontend should create a new session or refresh the list.
- Streaming issues: The SSE generator guarantees a final [DONE] marker; errors are sent with finish_reason set appropriately.
- Idle sessions disappearing: Configure max_idle_seconds and ensure periodic activity to keep sessions alive.
Section sources
Conclusion
The conversation management system provides robust session lifecycle control, efficient memory persistence, and scalable context preservation across multi-agent workflows. It integrates PostgreSQL for short-term history, Milvus for long-term memory, and Redis for performance optimization. The design balances concurrency safety, error resilience, and extensibility for future enhancements.