Ask questions about any codebase using AI-powered semantic search. This tool uses Retrieval-Augmented Generation (RAG) to provide accurate, context-aware answers about your code.
- 🔍 Semantic Code Search - Find relevant code using natural language
- 🤖 AI-Powered Answers - Get explanations using CodeLlama
- 📦 100% Local - No cloud calls, your code stays private
- 🚀 Fast Indexing - FAISS vector search for instant retrieval
- 💻 Clean UI - Beautiful Streamlit interface
- 🔌 GitHub Integration - Index any public repository
- "How does authentication work in this codebase?"
- "Explain the database connection logic"
- "What does the UserService class do?"
- "Show me all API endpoints"
- "Where is error handling implemented?"
- Python 3.8+
- Ollama - Download here
- Clone the repository:
git clone https://github.com/harshitak4/codebase-rag.git
cd codebase-rag- Install dependencies:
pip install -r requirements.txt- Install and start Ollama:
# Download Ollama from https://ollama.com/download
# Then pull the CodeLlama model:
ollama pull codellama:7b- Start Ollama server:
ollama servepython -m app.build_index --github https://github.com/pallets/flaskpython -m app.build_index --local /path/to/your/repostreamlit run ui/streamlit_app.pyOpen http://localhost:8501 in your browser.
From GitHub:
python -m app.build_index --github https://github.com/tiangolo/fastapiFrom Local Path:
python -m app.build_index --local ./my-projectpython test_rag.py- Start Streamlit:
streamlit run ui/streamlit_app.py - Enter your question
- Click "Ask"
- View AI-generated answer and source code
┌─────────────┐
│ GitHub │
│ Repository │
└──────┬──────┘
│
▼
┌─────────────────┐
│ Code Ingestion │
│ (AST Parser) │
└──────┬──────────┘
│
▼
┌──────────────────┐
│ Code Chunks │
│ (Functions/ │
│ Classes) │
└──────┬───────────┘
│
▼
┌──────────────────┐
│ Embeddings │
│ (SentenceTrans- │
│ former) │
└──────┬───────────┘
│
▼
┌──────────────────┐
│ FAISS Index │
│ (Vector Store) │
└──────┬───────────┘
│
▼
┌──────────────────┐ ┌─────────────┐
│ User Question │─────▶│ Search │
└──────────────────┘ └──────┬──────┘
│
▼
┌──────────────┐
│ Retrieved │
│ Code │
└──────┬───────┘
│
▼
┌──────────────┐
│ Ollama │
│ (CodeLlama) │
└──────┬───────┘
│
▼
┌──────────────┐
│ Answer │
└──────────────┘
codebase-rag/
├── app/
│ ├── __init__.py
│ ├── vector_store.py # FAISS vector store
│ ├── ingest_code.py # Code extraction (AST)
│ ├── ingest_github_repo.py # GitHub cloning
│ ├── build_index.py # Index building pipeline
│ └── rag_answer.py # RAG system with Ollama
├── ui/
│ └── streamlit_app.py # Web interface
├── data/
│ ├── repos/ # Cloned repositories
│ └── code_index/ # FAISS index + metadata
├── test_rag.py # CLI test script
├── requirements.txt # Dependencies
└── README.md # This file
Edit app/rag_answer.py:
# Default: codellama:7b
# Other options: codellama:13b, deepseek-coder, starcoder
rag = RAGAnswerer(model="deepseek-coder")In the Streamlit UI sidebar, use the slider to change the number of code chunks retrieved (default: 5).
Edit app/ingest_code.py:
MAX_FILE_SIZE_KB = 500 # Skip files larger than this
MAX_CHARS_PER_FILE = 100000 # Character limit per fileRun the test script:
python test_rag.pyThis will:
- Load the index
- Ask 3 test questions
- Show answers and retrieved code
Solution: Build an index first using python -m app.build_index
Solution: Start Ollama with ollama serve, then pull the model with ollama pull codellama:7b
Solution: The repository might not have any Python files, or they're all being filtered out. Check the ignore lists in app/ingest_code.py
Solution:
- Use a smaller model:
codellama:7binstead ofcodellama:13b - Reduce retrieval count (k parameter)
- Index fewer files
- Code Ingestion: Python files are parsed using AST to extract functions and classes
- Embedding Generation: Each code chunk is converted to a 384-dim vector using sentence-transformers
- Vector Indexing: Vectors are stored in a FAISS index for fast similarity search
- Query Processing: User questions are embedded and searched against the index
- Context Retrieval: Top-k most similar code chunks are retrieved
- Answer Generation: Retrieved code + question are sent to Ollama (CodeLlama)
- Response: AI generates a contextual answer based on actual code
- Indexing Speed: ~100 files/minute (depends on file size)
- Search Latency: <100ms for retrieval
- Answer Generation: 5-15 seconds (depends on model and hardware)
- Memory Usage: ~2GB RAM for small repos, ~5GB for large repos
Contributions welcome! Areas for improvement:
- Support for more languages (JavaScript, Java, etc.)
- Better code chunking strategies
- Web-based index building UI
- Multi-repo indexing
- Code similarity visualization
MIT License - feel free to use this for any purpose.
- FAISS by Meta AI
- sentence-transformers by UKPLab
- Ollama for easy local LLM deployment
- Streamlit for the awesome UI framework
Questions? Open an issue on GitHub!
Built with ❤️ for developers who want to understand codebases faster