Back
Podcast RAG System with Vector Search

Podcast RAG System with Vector Search

RAGPineconeVector DatabaseStreamlit

I built a working RAG system that automatically downloaded podcast episodes, transcribed audio using Whisper, created vector embeddings, and stored them in Pinecone for semantic search. Users could ask questions in natural language and receive contextually relevant answers grounded in actual podcast content.

Implemented the complete RAG pipeline including audio processing, transcript chunking, embedding generation, vector indexing, retrieval logic, and prompt engineering for accurate responses. Built a Streamlit interface that displayed indexed episodes and enabled conversational interaction with the knowledge base.

This project gave me deep understanding of vector database architecture, embedding strategies, semantic search tradeoffs, and cost-performance considerations in RAG systems.

Background

Srinivas skipped presentations and built real AI products.

Srinivas E was part of the September 2025 cohort at Curious PM, alongside 13 other talented participants.