RAGFlow AI

LangChain-Powered SaaS Intelligence Platform

Page 1 of 8 — Landing & Auth
01 — Hero Landing Page
app.example.com/landing-auth

AI-Powered SaaS Intelligence

RAG Chatbot with LangChain, Next.js & Pinecone Vector Search

🤖
99.9%
Uptime SLA
<200ms
Avg Response
10M+
Queries Served
500+
Enterprises
Core Platform Capabilities
🧠 RAG Pipeline
📚 Knowledge Base
💬 Contextual Chat
📊 Analytics
🔐 Role-Based Access
LLM Orchestration

Deploy in Minutes

Production-ready chatbot with citation support, session memory & token cost controls

🚀
02 — Login / Sign Up
app.example.com/landing-auth
🤖

RAGFlow AI

Sign in to your AI workspace

AVEOSOFT

✉️
🔒
🛡️

Enterprise SSO available

Contact sales for SAML / OIDC setup

01 — AI Operations Overview
app.example.com/main-dashboard
MENU
AI Operations Dashboard — Today, Apr 14 2026

24,318

Total Queries Today

↑ +18%

187ms

Avg Response Latency

↑ -23ms

94.2%

RAG Retrieval Accuracy

↑ +1.4%

$12.47

LLM Token Cost Today

↑ -8%

Query Volume — Last 7 Days

Mon
Tue
Wed
Thu
Fri
Sat
Sun

2:14

PM

RAG Pipeline Ingestion Completed

47 documents processed · 12,480 chunks · Pinecone namespace: prod-docs

1:05

PM

LLM Fallback Triggered

OpenAI rate limit hit · Auto-switched to Anthropic Claude 3.5 Sonnet

11:30

AM

New Knowledge Base Version Deployed

Product Docs v3.2 · 234 sources indexed · 48,320 total chunks

02 — Pipeline & Infrastructure Health
app.example.com/main-dashboard
MENU
System Health — Vector DB, Embeddings & API
97%

Vector DB Health

Pinecone · 2.4M vectors

89%

Embedding Coverage

18,420 / 20,680 chunks

99%

API Uptime

Last 30 days

Token Usage by Model (This Week)

GPT-4o
Claude 3.5
GPT-3.5
Embeddings

2.4M

Vectors Indexed

↑ +340K this week

99.7%

API Uptime (30d)

↑ stable

All Systems Operational
01 — Active Chat Session
app.example.com/rag-chatbot-interface
MENU
Active Session
History
Shared Threads
Playground
Chat Session · Context: Product Documentation v3.2

User

How does the RAG pipeline handle multi-document queries?

Session ID: sess_8f3k2 · Model: GPT-4o · Temp: 0.3 · Turn 4

AI

The pipeline uses a hybrid retrieval strategy combining dense vector search with metadata filters...

Sources: [1] Architecture Guide p.12 [2] API Docs §4.3 [3] Setup Guide p.8 · Tokens used: 847

User

What chunking strategies are supported?

Follow-up · Conversation turn 5 · Memory window: 10 turns

847

Tokens This Turn

↑ within budget

3

Sources Retrieved

↑ Top-k: 5

💬
02 — Citation & Source Panel
app.example.com/rag-chatbot-interface
MENU
Retrieved Sources · Similarity Scores & Chunk References
92%

Source Relevance

Architecture Guide p.12

87%

Source Relevance

API Reference §4.3

79%

Source Relevance

Setup Guide p.8

[1]

Architecture Guide — Page 12

Chunk ID: chk_4821 · Similarity: 0.923 · Namespace: prod-docs · RecursiveCharacterTextSplitter

[2]

API Reference — Section 4.3

Chunk ID: chk_2204 · Similarity: 0.874 · Namespace: prod-docs · PDF loader

[3]

Setup & Integration Guide — Page 8

Chunk ID: chk_9031 · Similarity: 0.791 · Namespace: prod-docs · Markdown loader

Session Memory Active: 12 turns Chain: ConversationalRetrievalChain + ConversationBufferWindowMemory
01 — Document Library
app.example.com/knowledge-base
MENU
All Documents
Processing Queue
Namespaces
Loaders
Knowledge Base — Product Documentation Workspace

234

Indexed Documents

↑ +12 this week

48,320

Total Chunks

↑ avg 206 per doc

1536

Embedding Dimensions

↑ text-embedding-3-large

PDF

Product Architecture Guide v3.2.pdf

Status: Indexed · 347 chunks · Uploaded Apr 14 2026 · 2.4 MB

MD

API Reference Documentation (GitHub Sync)

Status: Indexed · 1,204 chunks · Auto-synced · Last updated 2h ago

PDF

User Onboarding Guide v4.pdf

Status: Processing · 0 / 89 chunks · Queued · 1.1 MB

CSV

Product FAQ Dataset v2.csv

Status: Indexed · 542 chunks · 856 rows · Apr 12 2026

02 — Ingestion Pipeline Config
app.example.com/knowledge-base
MENU
RAG Ingestion Pipeline Configuration

Chunks per Document (Top Sources)

API Ref
FAQ
Arch Guide
Onboarding
Changelog
✂️

Chunking Strategy

RecursiveCharacterTextSplitter

1000 chars / 200 overlap
🧮

Embedding Model

OpenAI text-embedding-3-large

1536 dims
🗄️

Vector Database

Pinecone · us-east-1 region

prod-namespace
🔗

LangChain Document Loaders

PyPDFLoader + UnstructuredMarkdownLoader + CSVLoader

Active
Auto Re-index on Document Update
Metadata Enrichment (page numbers, section titles)
01 — Chatbot Performance Analytics
app.example.com/analytics-monitoring
MENU
Performance
Cost & Tokens
User Feedback
Error Logs
Chatbot Performance — Last 30 Days

186,420

Total Queries (30d)

↑ +24%

94.7%

Successful Responses

↑ +2.1%

204ms

P95 Latency

↑ -31ms

4.6 / 5

Avg User Rating

↑ +0.3 vs last month

Daily Query Volume — April 2026

Apr 1
Apr 7
Apr 10
Apr 14
Apr 21
Apr 28

RAG Quality Score by Source Type

PDF
Markdown
CSV
Web Scrape
02 — Cost & Token Monitoring
app.example.com/analytics-monitoring
MENU
LLM Cost & Token Usage Analytics

$342.18

Total LLM Cost (30d)

↑ -12% vs last month

48.2M

Total Tokens (30d)

↑ input + output combined

$0.0018

Cost per Query

↑ within $0.002 target

68%

Monthly Budget Used

$342 of $500 budget

Token Cost by Model — April 2026

GPT-4o
Claude 3.5
GPT-3.5-turbo
Embeddings

Alert

Budget threshold approaching 80% — estimated in 4 days

Recommendation: Route simple queries to GPT-3.5-turbo to reduce spend

Auto-route simple queries to GPT-3.5-turbo
01 — API Keys & Team Roles
app.example.com/api-access-management
MENU
API Keys
Team Members
Roles & Permissions
Audit Log
API Keys & Role-Based Access Control

8

Active API Keys

↓ 2 expiring in 7 days

5

Team Members

↑ 1 pending invite

PROD

sk-ragflow-prod-••••••••3f2a

Role: Admin · Created Mar 1 2026 · Last used: 2 min ago · Rate: 1000 req/min

DEV

sk-ragflow-dev-••••••••8b1c

Role: Developer · Created Apr 1 2026 · Rate limit: 100 req/min · Sandbox only

READ

sk-ragflow-read-••••••••4d9e

Role: Read-Only · Created Apr 10 2026 · Chatbot embed widget access only

👤

sarah.chen@company.com

Admin · Last login: 10 min ago

Active
👤

james.okafor@company.com

Developer · Last login: 1 hour ago

Active
02 — Rate Limits & Security Settings
app.example.com/api-access-management
MENU
Rate Limiting, Security & Compliance

API Request Distribution by Key (Today)

Prod
Dev
Read
Test

Global Rate Limit — Production Key

Enforced via Redis sliding window

1000 req/min
🔑

JWT Token Expiry

Session authentication tokens

24 hours
🌐

CORS Allowed Origins

Configured domain allowlist

3 domains
🔒

IP Allowlist

Production environment firewall

Enabled
Enforce HTTPS Only on All API Routes
Request Logging (GDPR-compliant anonymization)
SOC 2 Type II Compliant
01 — LLM & RAG Pipeline Settings
app.example.com/settings-configuration
MENU
LLM Settings
RAG Pipeline
Prompt Templates
Integrations
LLM Model & RAG Pipeline Configuration
🤖

Primary LLM

Response generation model

GPT-4o (gpt-4o-2024-08)
🔄

Fallback LLM

On rate limit or API error

Claude 3.5 Sonnet
🌡️

Temperature

Response creativity vs. determinism

0.3
📏

Max Tokens per Response

Cost and latency control ceiling

2048 tokens
🔍

Retrieval Top-K

Chunks returned from vector search

5 chunks
📊

Similarity Threshold

Minimum relevance score cutoff

0.75
Enable Streaming Responses (SSE)
Show Citation Sources to End Users
Session Memory — ConversationBufferWindowMemory (10 turns)
02 — Prompt Templates & Chain Config
app.example.com/settings-configuration
MENU
LangChain Prompt Templates & Chain Orchestration

SYS

System Prompt Template v2.1

You are a helpful AI assistant. Answer only from the provided {context}. If unsure, say so clearly.

RAG

ConversationalRetrievalChain Config

combine_docs_chain: StuffDocumentsChain · Compression: ContextualCompressionRetriever · Memory: 10 turns

META

Pre-Retrieval Metadata Filter Template

Filters by: namespace, doc_type, date_range, department — applied before vector similarity search

3

Active Chain Templates

↑ v2.1 deployed

12

Prompt Variables Mapped

↑ all bound

Feature Stack & Deliverables

Complete overview of confirmed features, deliverable items, and technical architecture for RAGFlow AI.

🏗️

Tech Stack

Next.js 14LangChainPineconeOpenAI APITypeScriptVercel

Core Technologies

Next.js 14 — App Router, streaming API routes, SSE for real-time chat responses
🦜
LangChain — ConversationalRetrievalChain, memory management, multi-LLM orchestration
🌲
Pinecone — Managed vector database for embedding storage, namespaces and metadata filtering
🤖
OpenAI API — GPT-4o for generation, text-embedding-3-large (1536d) for document embeddings
🔷
TypeScript — End-to-end type safety across frontend, API routes and LangChain integrations
Vercel — Serverless and edge deployment with CI/CD pipeline and environment secrets
📦

V1 Deliverables Checklist

  • Production-ready RAG chatbot UI in Next.js 14 with streaming responses via Server-Sent Events
  • LangChain ConversationalRetrievalChain with session memory and configurable context windows
  • Document ingestion pipeline: PDF, Markdown and CSV loaders with chunking, embedding and Pinecone indexing
  • Citation and source display with per-chunk similarity scores rendered inline in chat responses
  • Secure Next.js API routes with JWT authentication and role-based access control (Admin, Developer, Read-Only)
  • LLM cost controls: token budgets, auto model routing, per-key rate limiting via Redis sliding window
  • Analytics dashboard covering query volume, P95 latency, user ratings, cost-per-query and token breakdown
  • LangSmith observability integration for chain tracing, prompt debugging and response quality scoring
  • Multi-model fallback routing (OpenAI GPT-4o to Anthropic Claude 3.5 Sonnet) for reliability
  • Cloud deployment to Vercel with environment configuration, CI/CD pipeline and production release support
🔧

Architecture Layers

Frontend
Next.js 14 + React + TypeScript
App Router pages, streaming chat UI with SSE, citation source panels, analytics dashboards, shadcn/ui components, Tailwind CSS
AI Orchestration
LangChain + LangSmith
ConversationalRetrievalChain, ConversationBufferWindowMemory, ContextualCompressionRetriever, multi-LLM prompt routing, chain tracing and observability
RAG Pipeline
Pinecone + OpenAI Embeddings
PyPDFLoader, UnstructuredMarkdownLoader, CSVLoader, RecursiveCharacterTextSplitter, text-embedding-3-large, namespace and metadata management
Backend API
Next.js API Routes + NextAuth.js
Secure /api/chat (streaming), /api/ingest, /api/analytics endpoints, JWT middleware, RBAC, rate limiting, retry logic and token budget enforcement
Infrastructure
Vercel + PostgreSQL + Redis
Serverless edge deployment, PostgreSQL via PlanetScale for user and session data, Redis for rate limiting and response caching, environment secrets via Vercel