Native Audio & Video RAG. Now in Ragie.
Ragie supports ingest and retrieval from spoken and visual content. Upload recordings, ask questions, and get timestamped answers with streamable playback—no pipelines or transcription tooling required.
0:00 – 0:30
This video shows a sea_turtle swimming smoothly through clear, sunlit_water above a rocky_sandy_seabed. The audio is very calm, featuring the gentle sounds of water_movement and subtle ambient_underwater_noises, creating an immersive aquatic_scene.
Ragie’s audio and video support is more than a feature. It’s infrastructure built for developers shipping real products.
Native format support
Effortlessly process a wide range of audio formats including: MP3, WAV, M4A, OGG, AAC, and FLAC.
Seamlessly handle diverse video formats including: MP4, WebM, MOV, AVI, FLV, MKV, MPEG, MPEGS, MPG, WMV, and 3GPP.
Seamlessly handle diverse video formats including: MP4, WebM, MOV, AVI, FLV, MKV, MPEG, MPEGS, MPG, WMV, and 3GPP.
Multilingual transcription
Automatic, high-quality transcription for 100 audio languages including English, Chinese, Korean, Thai, Japanese, Arabic, Hindi, Russian, Spanish, French, German, Greek, and many more. No model selection or config required.
Smart chunking for audio and video
We don’t just split on time or paragraphs. Ragie uses audio and video-aware chunking that respects scene breaks, speaker changes, and topic shifts so each retrieval is meaningful.
Visual parsing for video
Video files are enhanced with frame-level sampling and visual description. This helps return better answers in contexts with visual references such as slides, gestures, or product demos.
Fast, scalable indexing
Whether you're processing one video or a full content library, Ragie indexes your content quickly and efficiently. From upload to searchable in minutes.
Timestamped, streamable results
Every result links back to the exact moment something was said or seen. Responses include streamable URLs for direct playback—no extra infra or processing required.
Built for developers, not just demos
Ragie’s audio and video pipeline is production-ready and fully modular, so you can move from prototype to production without switching tools.
1import { readFile } from "node:fs/promises";
2import { Ragie } from "ragie";
3
4// Path to a media file
5const filePath = "path/to/demo.mp4";
6
7const ragie = new Ragie({
8 auth: process.env.RAGIE_API_KEY
9});
10
11const fileContent = await readFile(filePath);
12const blob = new Blob([fileContent]);
13
14const response = await ragie.documents.create({
15 file: blob,
16 name: "demo.mp4",
17 metadata: {},
18 mode: { "video": "audio_video", "audio": true }
19});
20console.log(response);
1import os
2from ragie import Ragie
3
4# Path to a media file
5file_path = "path/to/demo.mp4"
6
7ragie = Ragie(auth=os.environ.get("RAGIE_API_KEY"))
8with open(file_path, "rb") as f:
9 response = ragie.documents.create(request={
10 "file": {
11 "file_name": "demo.mp4",
12 "content": f,
13 },
14 "metadata": {},
15 "mode": {"video": "audio_video", "audio": True}
16 })
17print(response)
1ragie import files --video=audio_video video-file.mp4
1#!/bin/bash
2
3curl -X POST "https://api.ragie.ai/documents" \
4 -H "Authorization: Bearer <YOUR API KEY>" \
5 -F "file=@path/to/file/demo.mp4;type=video/mp4" \
6 -F 'metadata={}' \
7 -F 'mode={"video":"audio_video","audio":true}'
Drop-in SDKs & API
Easily index and retrieve from audio/video using a few lines of code. No need to manage transcription, chunking, or storage.
Stream-ready outputs
Get timestamped references and URLs to stream clips from the exact moment they were spoken—perfect for agents and apps
Unified with text & docs
Handle audio, video, and text with the same retrieval interface. No branching logic, just clean context across modalities.
Enabling applications for a variety of use cases and industries
Ragie’s A/V capabilities unlock new dimensions for applications across industries.
book a demo to explore
Training & e-Learning
Turn long videos into instantly searchable knowledge bases. Help learners skip to the exact answer they need without rewatching hours of content.
Media & Entertainment
Analyze archives, automate tagging, and retrieve moments across large video libraries without manual review.
Healthcare & Research
Surface key insights from patient interviews or research footage with simple semantic queries.
Legal & Compliance
Search through hearings, depositions, and recorded interviews using natural language. Skip to the exact phrase being referenced. No scrubbing required.
Customer Support
Process call center audio, grade conversations, and identify upsell opportunities directly from recorded customer interactions.


Chat with your audio & video files
Base Chat now supports audio and video. Drop in your team’s recordings: meetings, interviews, training videos, or your entire content library, and instantly chat with them. Ask questions, get timestamped answers, and stream the exact moment something was said or seen.
No more scrubbing through hours of audio or video footage. Just ask, and Base Chat finds it for you.
Unlock the future of multimedia applications today
Get blazing-fast ingest, transcription, retrieval, and streaming, all at developer-friendly pricing.
Skip the complexity and scale with confidence.