01 — Hero Landing Page

app.example.com/landing-auth

AI-Powered SaaS Intelligence

RAG Chatbot with LangChain, Next.js & Pinecone Vector Search

🤖

99.9%

Uptime SLA

<200ms

Avg Response

10M+

Queries Served

500+

Enterprises

Core Platform Capabilities

🧠 RAG Pipeline

📚 Knowledge Base

💬 Contextual Chat

📊 Analytics

🔐 Role-Based Access

⚡ LLM Orchestration

Deploy in Minutes

Production-ready chatbot with citation support, session memory & token cost controls

🚀

02 — Login / Sign Up

app.example.com/landing-auth

🤖

RAGFlow AI

Sign in to your AI workspace

AVEOSOFT

✉️

🔒

🛡️

Enterprise SSO available

Contact sales for SAML / OIDC setup

→

01 — AI Operations Overview

app.example.com/main-dashboard

AI Operations Dashboard — Today, Apr 14 2026

24,318

Total Queries Today

↑ +18%

187ms

Avg Response Latency

↑ -23ms

94.2%

RAG Retrieval Accuracy

↑ +1.4%

$12.47

LLM Token Cost Today

↑ -8%

Query Volume — Last 7 Days

Mon

Tue

Wed

Thu

Fri

Sat

Sun

2:14

PM

RAG Pipeline Ingestion Completed

47 documents processed · 12,480 chunks · Pinecone namespace: prod-docs

1:05

PM

LLM Fallback Triggered

OpenAI rate limit hit · Auto-switched to Anthropic Claude 3.5 Sonnet

11:30

AM

New Knowledge Base Version Deployed

Product Docs v3.2 · 234 sources indexed · 48,320 total chunks

02 — Pipeline & Infrastructure Health

app.example.com/main-dashboard

System Health — Vector DB, Embeddings & API

Vector DB Health

Pinecone · 2.4M vectors

Embedding Coverage

18,420 / 20,680 chunks

API Uptime

Last 30 days

Token Usage by Model (This Week)

GPT-4o

Claude 3.5

GPT-3.5

Embeddings

2.4M

Vectors Indexed

↑ +340K this week

99.7%

API Uptime (30d)

↑ stable

All Systems Operational

01 — Active Chat Session

app.example.com/rag-chatbot-interface

Active Session

History

Shared Threads

Playground

Chat Session · Context: Product Documentation v3.2

User

How does the RAG pipeline handle multi-document queries?

Session ID: sess_8f3k2 · Model: GPT-4o · Temp: 0.3 · Turn 4

AI

The pipeline uses a hybrid retrieval strategy combining dense vector search with metadata filters...

Sources: [1] Architecture Guide p.12 [2] API Docs §4.3 [3] Setup Guide p.8 · Tokens used: 847

User

What chunking strategies are supported?

Follow-up · Conversation turn 5 · Memory window: 10 turns

847

Tokens This Turn

↑ within budget

3

Sources Retrieved

↑ Top-k: 5

💬

02 — Citation & Source Panel

app.example.com/rag-chatbot-interface

Retrieved Sources · Similarity Scores & Chunk References

Source Relevance

Architecture Guide p.12

Source Relevance

API Reference §4.3

Source Relevance

Setup Guide p.8

[1]

Architecture Guide — Page 12

Chunk ID: chk_4821 · Similarity: 0.923 · Namespace: prod-docs · RecursiveCharacterTextSplitter

[2]

API Reference — Section 4.3

Chunk ID: chk_2204 · Similarity: 0.874 · Namespace: prod-docs · PDF loader

[3]

Setup & Integration Guide — Page 8

Chunk ID: chk_9031 · Similarity: 0.791 · Namespace: prod-docs · Markdown loader

Session Memory Active: 12 turns Chain: ConversationalRetrievalChain + ConversationBufferWindowMemory

01 — Document Library

app.example.com/knowledge-base

All Documents

Processing Queue

Namespaces

Loaders

Knowledge Base — Product Documentation Workspace

234

Indexed Documents

↑ +12 this week

48,320

Total Chunks

↑ avg 206 per doc

1536

Embedding Dimensions

↑ text-embedding-3-large

PDF

Product Architecture Guide v3.2.pdf

Status: Indexed · 347 chunks · Uploaded Apr 14 2026 · 2.4 MB

MD

API Reference Documentation (GitHub Sync)

Status: Indexed · 1,204 chunks · Auto-synced · Last updated 2h ago

PDF

User Onboarding Guide v4.pdf

Status: Processing · 0 / 89 chunks · Queued · 1.1 MB

CSV

Product FAQ Dataset v2.csv

Status: Indexed · 542 chunks · 856 rows · Apr 12 2026

02 — Ingestion Pipeline Config

app.example.com/knowledge-base

RAG Ingestion Pipeline Configuration

Chunks per Document (Top Sources)

API Ref

FAQ

Arch Guide

Onboarding

Changelog

✂️

Chunking Strategy

RecursiveCharacterTextSplitter

1000 chars / 200 overlap

🧮

Embedding Model

OpenAI text-embedding-3-large

1536 dims

🗄️

Vector Database

Pinecone · us-east-1 region

prod-namespace

🔗

LangChain Document Loaders

PyPDFLoader + UnstructuredMarkdownLoader + CSVLoader

Active

Auto Re-index on Document Update

Metadata Enrichment (page numbers, section titles)

01 — Chatbot Performance Analytics

app.example.com/analytics-monitoring

Performance

Cost & Tokens

User Feedback

Error Logs

Chatbot Performance — Last 30 Days

186,420

Total Queries (30d)

↑ +24%

94.7%

Successful Responses

↑ +2.1%

204ms

P95 Latency

↑ -31ms

4.6 / 5

Avg User Rating

↑ +0.3 vs last month

Daily Query Volume — April 2026

Apr 1

Apr 7

Apr 10

Apr 14

Apr 21

Apr 28

RAG Quality Score by Source Type

PDF

Markdown

CSV

Web Scrape

02 — Cost & Token Monitoring

app.example.com/analytics-monitoring

LLM Cost & Token Usage Analytics

$342.18

Total LLM Cost (30d)

↑ -12% vs last month

48.2M

Total Tokens (30d)

↑ input + output combined

$0.0018

Cost per Query

↑ within $0.002 target

Monthly Budget Used

$342 of $500 budget

Token Cost by Model — April 2026

GPT-4o

Claude 3.5

GPT-3.5-turbo

Embeddings

Alert

Budget threshold approaching 80% — estimated in 4 days

Recommendation: Route simple queries to GPT-3.5-turbo to reduce spend

Auto-route simple queries to GPT-3.5-turbo

01 — API Keys & Team Roles

app.example.com/api-access-management

API Keys

Team Members

Roles & Permissions

Audit Log

API Keys & Role-Based Access Control

8

Active API Keys

↓ 2 expiring in 7 days

5

Team Members

↑ 1 pending invite

PROD

sk-ragflow-prod-••••••••3f2a

Role: Admin · Created Mar 1 2026 · Last used: 2 min ago · Rate: 1000 req/min

DEV

sk-ragflow-dev-••••••••8b1c

Role: Developer · Created Apr 1 2026 · Rate limit: 100 req/min · Sandbox only

READ

sk-ragflow-read-••••••••4d9e

Role: Read-Only · Created Apr 10 2026 · Chatbot embed widget access only

👤

sarah.chen@company.com

Admin · Last login: 10 min ago

Active

👤

james.okafor@company.com

Developer · Last login: 1 hour ago

Active

02 — Rate Limits & Security Settings

app.example.com/api-access-management

Rate Limiting, Security & Compliance

API Request Distribution by Key (Today)

Prod

Dev

Read

Test

⚡

Global Rate Limit — Production Key

Enforced via Redis sliding window

1000 req/min

🔑

JWT Token Expiry

Session authentication tokens

24 hours

🌐

CORS Allowed Origins

Configured domain allowlist

3 domains

🔒

IP Allowlist

Production environment firewall

Enabled

Enforce HTTPS Only on All API Routes

Request Logging (GDPR-compliant anonymization)

SOC 2 Type II Compliant

01 — LLM & RAG Pipeline Settings

app.example.com/settings-configuration

LLM Settings

RAG Pipeline

Prompt Templates

Integrations

LLM Model & RAG Pipeline Configuration

🤖

Primary LLM

Response generation model

GPT-4o (gpt-4o-2024-08)

🔄

Fallback LLM

On rate limit or API error

Claude 3.5 Sonnet

🌡️

Temperature

Response creativity vs. determinism

0.3

📏

Max Tokens per Response

Cost and latency control ceiling

2048 tokens

🔍

Retrieval Top-K

Chunks returned from vector search

5 chunks

📊

Similarity Threshold

Minimum relevance score cutoff

0.75

Enable Streaming Responses (SSE)

Show Citation Sources to End Users

Session Memory — ConversationBufferWindowMemory (10 turns)

02 — Prompt Templates & Chain Config

app.example.com/settings-configuration

LangChain Prompt Templates & Chain Orchestration

SYS

System Prompt Template v2.1

You are a helpful AI assistant. Answer only from the provided {context}. If unsure, say so clearly.

RAG

ConversationalRetrievalChain Config

combine_docs_chain: StuffDocumentsChain · Compression: ContextualCompressionRetriever · Memory: 10 turns

Feature Stack & Deliverables

Complete overview of confirmed features, deliverable items, and technical architecture for RAGFlow AI.

🏗️

Tech Stack

Next.js 14LangChainPineconeOpenAI APITypeScriptVercel

⚡

Core Technologies

▲

Next.js 14 — App Router, streaming API routes, SSE for real-time chat responses

🦜

LangChain — ConversationalRetrievalChain, memory management, multi-LLM orchestration

🌲

Pinecone — Managed vector database for embedding storage, namespaces and metadata filtering

🤖

OpenAI API — GPT-4o for generation, text-embedding-3-large (1536d) for document embeddings

🔷

TypeScript — End-to-end type safety across frontend, API routes and LangChain integrations

◆

Vercel — Serverless and edge deployment with CI/CD pipeline and environment secrets

📦

V1 Deliverables Checklist

Production-ready RAG chatbot UI in Next.js 14 with streaming responses via Server-Sent Events
LangChain ConversationalRetrievalChain with session memory and configurable context windows
Document ingestion pipeline: PDF, Markdown and CSV loaders with chunking, embedding and Pinecone indexing
Citation and source display with per-chunk similarity scores rendered inline in chat responses
Secure Next.js API routes with JWT authentication and role-based access control (Admin, Developer, Read-Only)
LLM cost controls: token budgets, auto model routing, per-key rate limiting via Redis sliding window
Analytics dashboard covering query volume, P95 latency, user ratings, cost-per-query and token breakdown
LangSmith observability integration for chain tracing, prompt debugging and response quality scoring
Multi-model fallback routing (OpenAI GPT-4o to Anthropic Claude 3.5 Sonnet) for reliability
Cloud deployment to Vercel with environment configuration, CI/CD pipeline and production release support

🔧

Architecture Layers

Frontend

Next.js 14 + React + TypeScript

App Router pages, streaming chat UI with SSE, citation source panels, analytics dashboards, shadcn/ui components, Tailwind CSS

AI Orchestration

LangChain + LangSmith

ConversationalRetrievalChain, ConversationBufferWindowMemory, ContextualCompressionRetriever, multi-LLM prompt routing, chain tracing and observability

RAG Pipeline

Pinecone + OpenAI Embeddings

PyPDFLoader, UnstructuredMarkdownLoader, CSVLoader, RecursiveCharacterTextSplitter, text-embedding-3-large, namespace and metadata management

Backend API

Next.js API Routes + NextAuth.js

Secure /api/chat (streaming), /api/ingest, /api/analytics endpoints, JWT middleware, RBAC, rate limiting, retry logic and token budget enforcement

Infrastructure

Vercel + PostgreSQL + Redis

Serverless edge deployment, PostgreSQL via PlanetScale for user and session data, Redis for rate limiting and response caching, environment secrets via Vercel

RAGFlow AI

Landing & Auth

RAGFlow AI

Main Dashboard

RAG Chatbot Interface

Knowledge Base

Analytics & Monitoring

API & Access Management

Settings & Configuration

RAGFlow AI

Feature Stack & Deliverables

Tech Stack

Core Technologies

V1 Deliverables Checklist

Architecture Layers