MultiDB_RAG_AI

RAG Pipeline | Vector Search | LLM Integration | Multi-Database Architecture

Enterprise-Grade AI Architecture for Production Systems

A production-ready Retrieval-Augmented Generation (RAG) system demonstrating polyglot persistence, two-plane architecture, and enterprise patterns.

Quick Start · Architecture · Documentation · Tech Stack

The Challenge

Building a chatbot is straightforward. Building one that scales to millions of users while maintaining sub-100ms latency, controlling costs, and enabling zero-downtime deployments is an engineering challenge that most implementations fail to address.

This project demonstrates how to architect AI systems for the real world.

Key Features

Polyglot Persistence

Each database chosen for its strengths:

MongoDB Atlas — Vector search & semantic retrieval
PostgreSQL — Users, auth, billing (ACID)
ScyllaDB — High-throughput conversation logs
Redis — Session cache & rate limiting

Two-Plane Architecture

Clean separation of concerns:

Data Plane — Async ingestion, chunking, embedding
Serving Plane — Real-time inference & API
Independent scaling & fault isolation
Zero-downtime model deployments

Production-Ready

Enterprise patterns from day one:

JWT auth with role-based access control
Usage-based billing & quota enforcement
Structured logging & health checks
Connection pooling & query caching

RAG Pipeline

Complete retrieval-augmented generation:

Document ingestion & chunking
Semantic search with re-ranking
Context-aware response generation
Conversation memory management

Architecture

┌─────────────────────────────────────────────────────────────────────────────┐
│                              CLIENT LAYER                                    │
│                    Web Apps  ·  Mobile  ·  API Clients                      │
└─────────────────────────────────────┬───────────────────────────────────────┘
                                      │
┌─────────────────────────────────────▼───────────────────────────────────────┐
│                              API GATEWAY                                     │
│              FastAPI  ·  JWT Auth  ·  Rate Limiting  ·  OpenAPI             │
└───────────────────┬─────────────────────────────────────┬───────────────────┘
                    │                                     │
    ┌───────────────▼───────────────┐     ┌───────────────▼───────────────┐
    │         DATA PLANE            │     │        SERVING PLANE          │
    │  ┌─────────────────────────┐  │     │  ┌─────────────────────────┐  │
    │  │   Document Loaders      │  │     │  │    LangGraph Agent      │  │
    │  │   Text Splitters        │  │     │  │    Stateful Reasoning   │  │
    │  │   Embedding Service     │  │     │  │    Tool Orchestration   │  │
    │  │   Vector Indexing       │  │     │  │    Response Generation  │  │
    │  └─────────────────────────┘  │     │  └─────────────────────────┘  │
    └───────────────┬───────────────┘     └───────────────┬───────────────┘
                    │                                     │
┌───────────────────▼─────────────────────────────────────▼───────────────────┐
│                              DATA LAYER                                      │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐        │
│  │  MongoDB    │  │ PostgreSQL  │  │  ScyllaDB   │  │    Redis    │        │
│  │  Vectors    │  │  Users/Auth │  │  Chat Logs  │  │    Cache    │        │
│  └─────────────┘  └─────────────┘  └─────────────┘  └─────────────┘        │
└─────────────────────────────────────────────────────────────────────────────┘

Why this design?

Data Plane handles batch processing independently from real-time serving
Serving Plane scales horizontally without affecting ingestion pipelines
Specialized databases deliver 10x better performance than one-size-fits-all solutions
Composable interfaces allow swapping components without architectural rewrites

Quick Start

# Clone and start all services
git clone https://github.com/asq-sheriff/MultiDB_RAG_AI.git
cd MultiDB_RAG_AI
docker-compose up -d

# Get authentication token
curl -X POST http://localhost:8000/auth/login \
  -H "Content-Type: application/json" \
  -d '{"email": "test@example.com", "password": "testpassword123"}'

# Send your first message
curl -X POST http://localhost:8000/chat/message \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"message": "What databases are best for AI applications?"}'

Development commands:

make dev          # Start with hot reload
make test         # Run test suite
make lint         # Format and lint
make build        # Build production image

Tech Stack

Layer	Technology	Purpose
API	FastAPI	Async web framework with automatic OpenAPI docs
Vector Store	MongoDB Atlas	Semantic search with native vector indexing
Relational DB	PostgreSQL 15	Users, auth, billing with ACID guarantees
Time-Series	ScyllaDB	High-throughput conversation logging (1M+ writes/sec)
Cache	Redis 7	Sub-millisecond session and rate limit lookups
Embeddings	Sentence Transformers	Open-source 768-dim embeddings
Agent Framework	LangGraph	Stateful reasoning with tool orchestration
Containers	Docker Compose	Local development environment
CI/CD	GitHub Actions	Automated testing and quality gates

Project Structure

MultiDB_RAG_AI/
├── app/
│   ├── api/                 # FastAPI routes and schemas
│   │   └── endpoints/       # Auth, chat, billing, search
│   ├── services/            # Business logic layer
│   │   ├── chatbot_service.py
│   │   ├── knowledge_service.py
│   │   ├── embedding_service.py
│   │   └── billing_service.py
│   ├── database/            # Database connections and models
│   └── config.py            # Pydantic settings
├── docs/                    # Technical documentation
├── tests/                   # Unit and integration tests
├── scripts/                 # Utility scripts
├── docker-compose.yml       # Local development stack
├── Makefile                 # Development commands
└── requirements.txt         # Python dependencies

Documentation

Document	Description
System Design	Comprehensive architecture deep-dive
Architecture Overview	High-level system design
Codebase Overview	Quick orientation for contributors
Roadmap	Current state and future direction
RAG Fundamentals	Introduction to RAG concepts

See docs/README.md for the complete documentation index.

Design Decisions

Why Multiple Databases?

Different data types have fundamentally different access patterns:

Data Type	Access Pattern	Best Fit	Why Not Alternatives
Vectors	Similarity search	MongoDB Atlas	pgvector lacks scale; Pinecone = vendor lock
Users/Billing	ACID transactions	PostgreSQL	MongoDB lacks ACID; need complex queries
Chat History	Append-only, time-series	ScyllaDB	10x faster than Cassandra; built for writes
Sessions	Sub-ms lookups	Redis	Nothing else matches latency requirements

Why Two-Plane Architecture?

Concern	Single-Plane Problem	Two-Plane Solution
Scaling	Ingestion spikes affect serving	Independent scaling
Deployments	Model updates risk downtime	Blue-green deployments
Failures	One failure affects everything	Fault isolation
Costs	Can't optimize separately	Right-size each plane

Roadmap

Current (v1.0) — Foundation

Multi-database architecture with specialized stores
JWT authentication with RBAC
Usage-based billing with quota enforcement
Complete RAG pipeline
Docker-based local development

Next (v1.1) — Cloud Native

Terraform infrastructure as code
Kubernetes deployment manifests
OpenTelemetry observability
Automated blue-green deployments

Future (v2.0) — Advanced Features

Multi-agent orchestration
Long-term conversation memory
Streaming responses
A/B testing infrastructure

Contributing

# Create feature branch
git checkout -b feat/your-feature

# Run quality checks
make test && make lint

# Submit PR with:
# - Clear description of changes
# - Test coverage for new code
# - Documentation updates if needed

License

Apache 2.0 License — See LICENSE for details.

Built to demonstrate production AI engineering

View Documentation · Report Issue · Connect on LinkedIn

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
.github		.github
app		app
docs		docs
scripts		scripts
tests		tests
.coveragerc		.coveragerc
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
NOTICE		NOTICE
README.md		README.md
__init__.py		__init__.py
docker-cleanup.sh		docker-cleanup.sh
docker-compose.yml		docker-compose.yml
docker-health.sh		docker-health.sh
docker-start.sh		docker-start.sh
docker-stop.sh		docker-stop.sh
pytest.ini		pytest.ini
requirements.txt		requirements.txt
setup-docker-ops.sh		setup-docker-ops.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

MultiDB_RAG_AI

RAG Pipeline | Vector Search | LLM Integration | Multi-Database Architecture

Enterprise-Grade AI Architecture for Production Systems

The Challenge

Key Features

Polyglot Persistence

Two-Plane Architecture

Production-Ready

RAG Pipeline

Architecture

Quick Start

Tech Stack

Project Structure

Documentation

Design Decisions

Why Multiple Databases?

Why Two-Plane Architecture?

Roadmap

Contributing

License

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

asq-sheriff/MultiDB_RAG_AI

Folders and files

Latest commit

History

Repository files navigation

MultiDB_RAG_AI

RAG Pipeline | Vector Search | LLM Integration | Multi-Database Architecture

Enterprise-Grade AI Architecture for Production Systems

The Challenge

Key Features

Polyglot Persistence

Two-Plane Architecture

Production-Ready

RAG Pipeline

Architecture

Quick Start

Tech Stack

Project Structure

Documentation

Design Decisions

Why Multiple Databases?

Why Two-Plane Architecture?

Roadmap

Contributing

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages