Generate Subtitles from Audio for Free (Browser-Based AI Tool)
Creating subtitles used to require expensive software, manual transcription services, or clunky desktop applications that uploaded your files to remote servers. In 2026, IntlPull AI has revolutionized subtitle generation—and now it runs entirely in your browser.
This guide shows you how to generate professional-quality subtitles from any audio or video file using IntlPull's free subtitle generator. Unlike other subtitle tools, this browser-based solution requires no uploads. No privacy concerns. No costs. Just drag, drop, and wait.
The Subtitle Generation Revolution
If you need the most accurate automatic subtitle generator 2026 Whisper workflow, start with clean audio, generate a source-language SRT file, then manually review names, acronyms, timestamps, and sentence breaks before translating or publishing.
What Changed?
Three technological breakthroughs converged to make browser-based subtitle generation possible:
- OpenAI Whisper (2022-2024): State-of-the-art speech recognition models (the engine behind IntlPull AI)
- WebAssembly (WASM) + WebGPU: Browsers can now run AI models at near-native speeds
- Transformer.js (2023-2026): JavaScript library that packages AI models for browser inference
The result: You can now generate subtitles from a 2-hour video without uploading a single byte to a server.
Why This Matters
Privacy: Medical interviews, corporate training, confidential content—no third party ever sees your files.
Cost: No per-minute pricing. Generate subtitles for 1,000 hours of content for free.
Speed: No upload/download latency. On modern hardware (M1 Mac, recent GPUs), generation runs faster than real-time.
Accessibility: Works offline after initial model download. Perfect for restricted networks.
How Browser-Based IntlPull AI Works
Here's the high-level architecture:
1. User uploads audio/video file
↓
2. FFmpeg.wasm extracts audio track (if video)
↓
3. Audio converted to 16kHz mono WAV (Whisper's input format)
↓
4. IntlPull AI processes audio in chunks
↓
5. Model outputs transcription with timestamps
↓
6. JavaScript formats output as SRT or VTT
↓
7. User downloads subtitle file
Everything happens in your browser's memory. The audio file never leaves your device.
Model Selection
IntlPull's AI subtitle generator tool offers two IntlPull AI models:
| Model | Size | Languages | Speed (M1 Mac) | Accuracy |
|---|---|---|---|---|
| whisper-tiny.en | 77 MB | English only | 10x real-time | ~85% WER |
| whisper-small | 490 MB | 99 languages | 3x real-time | ~90% WER |
WER (Word Error Rate): Lower is better. 90% WER = 9 out of 10 words correct.
Recommendation:
- English content, speed priority → whisper-tiny.en
- Multilingual content, quality priority → whisper-small
- 1-hour video, M1 MacBook Pro:
- Tiny model: ~6 minutes generation time
- Small model: ~20 minutes generation time
Step-by-Step: Generate Subtitle for Video Files
Step 1: Access the Tool
Navigate to intlpull.com/tools/subtitles/generate
No account or sign-up required.
Step 2: Check Browser Compatibility
Recommended browsers:
- ✅ Chrome/Edge 113+ (best WebGPU support)
- ✅ Firefox 121+ (WebGPU enabled in config)
- ⚠️ Safari 17+ (WebGPU experimental, slower)
Hardware acceleration:
- WebGPU available: Uses your GPU for 5-10x faster processing
- Fallback to WASM SIMD: Slower but still functional on any modern device
The tool auto-detects your browser capabilities and selects the fastest execution method.
Step 3: Upload Your File
Drag and drop or click to upload:
- Audio formats: MP3, WAV, FLAC, AAC, OGG, M4A
- Video formats: MP4, MKV, AVI, MOV, WEBM
- File size limit: 2GB (approximately 10 hours of video)
Video files: The tool extracts the audio track automatically using FFmpeg.wasm. Original video is never loaded into memory (too large).
Step 4: Configure Generation Settings
Language Selection
If you know your audio language, select it from the dropdown:
- English (default)
- Spanish (Español)
- French (Français)
- German (Deutsch)
- Mandarin Chinese (中文)
- Japanese (日本語)
- Korean (한국어)
- And 90+ more languages
Why specify language? Whisper performs better when the model knows the expected language. Auto-detect works but is slightly less accurate.
Model Selection
- IntlPull AI Tiny (English only, faster)
- IntlPull AI Small (multilingual, better accuracy)
First-time users: The model downloads once and caches in your browser. Subsequent uses are instant.
Output Format
- SRT (SubRip): Most compatible format, works on YouTube/Vimeo/VLC
- VTT (WebVTT): HTML5 video players, better accessibility features
See our format comparison guide for details.
Advanced Options
Timestamp Granularity:
- Word-level (default): One subtitle per phrase (2-5 words)
- Sentence-level: One subtitle per sentence (better readability but longer display time)
Punctuation:
- Auto-punctuation (recommended): AI adds commas, periods, question marks
- Raw transcription: No punctuation (useful for technical transcription)
Speaker Diarization (experimental):
- Enabled: Attempts to identify different speakers and label them
- Disabled: All transcription treated as single speaker
Note: Speaker diarization adds ~20% processing time and requires the small model.
Step 5: Generate Subtitles
Click "Generate Subtitles".
What happens next:
- Model loading (first time only): Downloads and caches AI model (30-90 seconds)
- Audio extraction (video files only): FFmpeg extracts audio track (5-15 seconds)
- Audio preprocessing: Converts to 16kHz mono WAV (1-5 seconds)
- Transcription: IntlPull AI processes audio in 30-second chunks with progress bar
- Post-processing: Formats timestamps, applies punctuation, validates SRT/VTT structure
Progress indicator: Real-time progress bar shows:
- Current chunk being processed
- Estimated time remaining
- Processing speed (real-time ratio)
Performance tip: Close other browser tabs during processing to maximize available RAM and GPU resources.
Step 6: Review and Edit
Once generation completes, the tool displays:
- Side-by-side preview: Audio waveform + generated subtitles
- Inline editor: Click any subtitle to edit text or adjust timing
- Playback sync: Click a subtitle to jump to that timestamp in audio
Common edits needed:
- Proper nouns: AI may misspell names, brands, technical terms
- Example: "open AI" → "OpenAI"
- Homophones: Words that sound alike but have different meanings
- Example: "their" vs "there" vs "they're"
- Punctuation: Occasionally misses or adds incorrect punctuation
- Line breaks: Adjust for readability (max 2 lines per subtitle)
Editing shortcuts:
- Tab: Move to next subtitle
- Shift+Tab: Move to previous subtitle
- Ctrl+S: Save changes
- Space: Play/pause audio
Step 7: Download Your Subtitles
Click "Download SRT" or "Download VTT" to save the file.
Filename convention: Automatically appends language code:
- Original:
video.mp4 - Generated:
video.en.srt
Test your subtitles:
- Open your video in VLC Media Player
- Drag the SRT/VTT file onto VLC
- Subtitles should auto-sync and display
- Verify accuracy for first 2-3 minutes
If timing is off, use IntlPull's subtitle sync tool to adjust globally.
Tips for Better Subtitle Generation Results
1. Audio Quality is Everything
Optimal audio:
- Clear speaker voice
- Minimal background noise
- Consistent volume levels
- No overlapping speakers
Problem audio:
- Heavy music/sound effects
- Echo or reverb
- Multiple simultaneous speakers
- Low bit-rate compression artifacts
Preprocessing tip: If your audio is noisy, run it through a noise reduction filter first (Audacity's "Noise Reduction" is free).
2. Handle Background Music
Whisper sometimes transcribes background music lyrics as speech. Solutions:
- Music-only sections: Manually delete subtitles during intro/outro music
- Audio editing: Use an audio editor to duck (lower) music during speech
- Post-generation cleanup: Use find/replace to remove common music transcription errors
3. Multi-Speaker Content
For interviews, panels, or conversations:
- Enable speaker diarization if available
- Manual labeling: After generation, manually add speaker labels:
SRT
11 200:00:01,000 --> 00:00:03,500 3- Host: Welcome to the show! 4 52 600:00:03,600 --> 00:00:06,000 7- Guest: Thanks for having me.
4. Technical Terminology and Jargon
Whisper's training data includes technical content, but it may struggle with:
- Domain-specific acronyms (e.g., "CI/CD" → "CICD" or "C I C D")
- Product names (e.g., "PostgreSQL" → "Post Gress Q L")
- Non-English technical terms
Solution: After generation, use find/replace to fix recurring misrecognitions:
- Find: "post gress Q L" → Replace: "PostgreSQL"
- Find: "cube control" → Replace: "kubectl"
5. Accents and Non-Native Speakers
Whisper handles accents reasonably well but accuracy drops with:
- Heavy regional accents
- Non-native speakers with strong accents
- Code-switching (mixing languages mid-sentence)
Mitigation:
- Select the speaker's native language (even if speaking English)
- Use the larger
whisper-smallmodel for better accuracy - Budget extra time for manual corrections
6. Long-Form Content (2+ Hours)
Browser memory limits can become an issue with very long videos:
Workaround:
- Split video into 30-60 minute chunks
- Generate subtitles for each chunk separately
- Merge SRT files using a text editor or IntlPull's subtitle merger
Merging SRT files:
SRT1# Chunk 1 ends at 00:30:00 2# Chunk 2 starts at 00:30:00 3# Adjust chunk 2 sequence numbers to continue from chunk 1's last number 4# Adjust chunk 2 timecodes to offset by +00:30:00
Browser Requirements and Performance
Minimum Requirements
| Component | Minimum | Recommended |
|---|---|---|
| RAM | 4 GB | 8 GB+ |
| CPU | 2017+ Intel/AMD | Apple Silicon / Ryzen 5000+ |
| GPU | Integrated graphics | Discrete GPU (RTX 3060+) |
| Browser | Chrome 100+ | Chrome 120+ with WebGPU |
| Storage | 500 MB free | 1 GB free (for model cache) |
Performance Benchmarks
Generating subtitles for a 10-minute video:
| Device | Model | Time | Real-Time Ratio |
|---|---|---|---|
| M1 MacBook Pro | tiny.en | 60 seconds | 10x |
| M1 MacBook Pro | small | 200 seconds | 3x |
| Intel i7-12700 + RTX 3060 | tiny.en | 90 seconds | 6.6x |
| Intel i7-12700 + RTX 3060 | small | 250 seconds | 2.4x |
| Intel i5-10400 (no GPU) | tiny.en | 300 seconds | 2x |
| Intel i5-10400 (no GPU) | small | 800 seconds | 0.75x |
Real-time ratio: Higher is better. 10x = generates subtitles 10 times faster than video duration.
WebGPU vs WASM Performance
| Backend | Speed | Compatibility |
|---|---|---|
| WebGPU | 5-10x faster | Chrome 113+, Edge 113+, Firefox 121+ (flag) |
| WASM SIMD | Baseline | All modern browsers |
Checking your backend: The tool displays "WebGPU acceleration enabled" or "Running on WASM fallback" in the status bar.
Mobile Device Support
⚠️ Mobile devices are not recommended for subtitle generation due to:
- Limited RAM (tab crashes on large files)
- Slower processors (generation takes 10x+ longer)
- Battery drain
Workaround: Use a desktop/laptop browser, or IntlPull's cloud API for mobile subtitle generation.
Privacy: Your Audio Never Leaves Your Device
How It Works
Traditional subtitle services (Rev, Otter.ai, YouTube) upload your audio to their servers:
Your device → Server transcription → Download result
IntlPull's browser-based tool:
Your device → (everything happens locally) → Download result
What This Means
- ✅ No upload: Audio file stays in browser memory, never transmitted
- ✅ No storage: Files are never written to disk (except model cache)
- ✅ No logging: No record of what you transcribe
- ✅ Offline capable: Works without internet after model download
Verifying Privacy
Check network tab (Chrome DevTools):
- Open DevTools (F12)
- Go to "Network" tab
- Start subtitle generation
- You'll only see:
- Model download (first time only, from HuggingFace CDN)
- No other network requests
Open-source code: IntlPull's subtitle generator is open-source. Review the code at github.com/intlpull/web/apps/web/src/app/tools/subtitles/generate.
Supported Languages (99 Total)
Whisper supports 99 languages with varying accuracy:
Tier 1 (Excellent Accuracy)
English, Spanish, French, German, Italian, Portuguese, Dutch, Polish, Turkish, Russian, Korean, Japanese, Mandarin Chinese, Cantonese, Indonesian, Malay, Vietnamese, Thai, Hindi
Tier 2 (Good Accuracy)
Arabic, Hebrew, Greek, Czech, Slovak, Romanian, Hungarian, Finnish, Swedish, Norwegian, Danish, Ukrainian, Bulgarian, Croatian, Serbian, Catalan, Filipino
Tier 3 (Moderate Accuracy)
Persian, Urdu, Bengali, Tamil, Telugu, Marathi, Gujarati, Swahili, Amharic, Yoruba, Zulu, Afrikaans, Icelandic, Estonian, Latvian, Lithuanian, Slovenian, Albanian, Macedonian, Bosnian, Welsh, Basque
Language-specific notes:
- Mandarin: Specify "Chinese (Simplified)" for mainland China or "Chinese (Traditional)" for Taiwan
- Arabic: Struggles with dialects; Modern Standard Arabic works best
- Code-switching: Mixed languages (e.g., Spanglish) reduce accuracy significantly
What to Do After Generating Subtitles
1. Translate to Other Languages
Use IntlPull's subtitle translator to create multilingual versions:
English.srt → Spanish.srt
→ French.srt
→ German.srt
See our subtitle translation guide.
2. Upload to Video Platforms
YouTube:
- YouTube Studio → Subtitles
- Select video → Add language → English
- Upload file → Select your
.srtor.vttfile
Vimeo:
- Video settings → Distribution → Subtitles
- Add subtitles → Upload file
Wistia:
- Customize → Captions
- Upload captions → Choose file
3. Embed in Website
For HTML5 video players:
HTML1<video controls> 2 <source src="video.mp4" type="video/mp4"> 3 <track kind="subtitles" src="video.en.vtt" srclang="en" label="English" default> 4 <track kind="subtitles" src="video.es.vtt" srclang="es" label="Español"> 5</video>
Note: Use VTT format for <track> element (not SRT).
4. Create Burned-In Subtitles
"Hardcode" subtitles directly into video:
Terminalffmpeg -i video.mp4 -vf subtitles=video.srt output.mp4
When to burn in:
- Social media (Instagram, TikTok) where separate subtitle tracks aren't supported
- Presentations where you can't guarantee subtitle file availability
- Legacy video players without subtitle support
Troubleshooting Common Issues
Issue 1: Model Download Fails
Symptoms: "Failed to load model" error
Solutions:
- Check internet connection: Models are 77-490 MB
- Clear browser cache: Old model versions may be corrupted
- Try different browser: Safari sometimes has CORS issues
- Disable VPN: Some VPNs block HuggingFace CDN
Issue 2: Browser Tab Crashes
Symptoms: Tab crashes during processing, especially on large files
Solutions:
- Close other tabs: Free up RAM
- Use smaller model: Switch to
whisper-tiny.en - Split video: Process in chunks if file is > 1 hour
- Increase browser memory limit: Chrome flags:
chrome://flags/#max-tiles-for-interest-area chrome://flags/#force-memory-limit
Issue 3: Subtitles Out of Sync
Symptoms: Subtitles appear too early or too late
Solutions:
- Variable frame rate video: Convert to constant frame rate first:
Terminal
ffmpeg -i input.mp4 -r 30 -c:v libx264 output.mp4 - Audio delay in source: Use subtitle sync tool to offset all timecodes
- Regenerate: Sometimes a one-off glitch, try generating again
Issue 4: Poor Transcription Accuracy
Symptoms: Many incorrect words, nonsensical phrases
Solutions:
- Specify language: Auto-detect is less accurate
- Use larger model: Switch to
whisper-small - Improve audio quality: Reduce background noise, boost speech volume
- Check language support: Some languages are Tier 3 (moderate accuracy)
Issue 5: WebGPU Not Available
Symptoms: "WebGPU not supported" message, slow processing
Solutions:
- Update browser: Chrome 113+, Edge 113+, Firefox 121+
- Enable WebGPU in Firefox:
about:config → dom.webgpu.enabled → true - Check GPU drivers: Update graphics drivers to latest version
- Fallback to WASM: Still works, just slower
API for Developers
For automated workflows, IntlPull offers a cloud API:
Terminal1curl -X POST https://api.intlpull.com/v1/subtitles/generate \ 2 -H "X-API-Key: ip_live_..." \ 3 -F "file=@video.mp4" \ 4 -F "language=en" \ 5 -F "model=whisper-small" \ 6 -F "format=srt"
Response:
JSON1{ 2 "job_id": "job_abc123", 3 "status": "processing", 4 "eta_seconds": 120 5}
Pricing: Free tier (100 minutes/month), paid plans from $0.006/minute.
See API documentation for details.
Comparison: Browser Tool vs Cloud Services
| Feature | IntlPull Browser | Rev.com | Otter.ai | YouTube Auto |
|---|---|---|---|---|
| Cost | Free | $1.50/min | $10/mo | Free |
| Privacy | 100% local | Upload | Upload | Upload |
| Speed | 2-10x realtime | 12-24 hours | 1x realtime | 5 mins |
| Accuracy (English) | ~90% | ~99% | ~85% | ~80% |
| Languages | 99 | English only | English only | 100+ |
| Editing | Built-in | Manual | Built-in | YouTube Studio |
| Output formats | SRT, VTT | SRT, VTT, TXT | SRT, TXT | SRT (via YouTube) |
| Best for | Privacy, cost, speed | Maximum accuracy | Meetings | YouTube videos |
Our take:
- Confidential content: IntlPull browser tool (privacy)
- Critical accuracy (legal, medical): Rev.com (human transcription)
- YouTube content: YouTube Auto + manual cleanup (free, integrated)
- Scale (1000+ hours): IntlPull API (automation)
Conclusion
Browser-based subtitle generation using IntlPull AI has made professional-quality transcription accessible to everyone. No costs, no uploads, no privacy concerns—just drag, drop, and download.
Try it now: Generate Subtitles from Audio
Once you've generated your subtitles:
For teams managing video content at scale, explore IntlPull's TMS platform with team collaboration, translation memory, and automated subtitle workflows.
Related Tools:
- Free Subtitle Generator - Generate from audio/video
- Free Subtitle Translator - Translate to 100+ languages
- Free Subtitle Format Converter - Convert SRT/VTT/SBV/ASS
