IntlPull
Tutorial
10 min read

Free AI Subtitle Generator from Audio & Video (SRT/VTT)

Generate SRT and VTT subtitles from audio or video in your browser. Private, free, no upload required, with export-ready subtitle files.

IntlPull Team
IntlPull Team
Feb 12, 2026
On this page
Summary

Generate SRT and VTT subtitles from audio or video in your browser. Private, free, no upload required, with export-ready subtitle files.

Generate Subtitles from Audio for Free (Browser-Based AI Tool)

Creating subtitles used to require expensive software, manual transcription services, or clunky desktop applications that uploaded your files to remote servers. In 2026, IntlPull AI has revolutionized subtitle generation—and now it runs entirely in your browser.

This guide shows you how to generate professional-quality subtitles from any audio or video file using IntlPull's free subtitle generator. Unlike other subtitle tools, this browser-based solution requires no uploads. No privacy concerns. No costs. Just drag, drop, and wait.

The Subtitle Generation Revolution

If you need the most accurate automatic subtitle generator 2026 Whisper workflow, start with clean audio, generate a source-language SRT file, then manually review names, acronyms, timestamps, and sentence breaks before translating or publishing.

What Changed?

Three technological breakthroughs converged to make browser-based subtitle generation possible:

  1. OpenAI Whisper (2022-2024): State-of-the-art speech recognition models (the engine behind IntlPull AI)
  2. WebAssembly (WASM) + WebGPU: Browsers can now run AI models at near-native speeds
  3. Transformer.js (2023-2026): JavaScript library that packages AI models for browser inference

The result: You can now generate subtitles from a 2-hour video without uploading a single byte to a server.

Why This Matters

Privacy: Medical interviews, corporate training, confidential content—no third party ever sees your files.

Cost: No per-minute pricing. Generate subtitles for 1,000 hours of content for free.

Speed: No upload/download latency. On modern hardware (M1 Mac, recent GPUs), generation runs faster than real-time.

Accessibility: Works offline after initial model download. Perfect for restricted networks.


How Browser-Based IntlPull AI Works

Here's the high-level architecture:

1. User uploads audio/video file
   ↓
2. FFmpeg.wasm extracts audio track (if video)
   ↓
3. Audio converted to 16kHz mono WAV (Whisper's input format)
   ↓
4. IntlPull AI processes audio in chunks
   ↓
5. Model outputs transcription with timestamps
   ↓
6. JavaScript formats output as SRT or VTT
   ↓
7. User downloads subtitle file

Everything happens in your browser's memory. The audio file never leaves your device.

Model Selection

IntlPull's AI subtitle generator tool offers two IntlPull AI models:

ModelSizeLanguagesSpeed (M1 Mac)Accuracy
whisper-tiny.en77 MBEnglish only10x real-time~85% WER
whisper-small490 MB99 languages3x real-time~90% WER

WER (Word Error Rate): Lower is better. 90% WER = 9 out of 10 words correct.

Recommendation:

  • English content, speed priority → whisper-tiny.en
  • Multilingual content, quality priority → whisper-small
  • 1-hour video, M1 MacBook Pro:
    • Tiny model: ~6 minutes generation time
    • Small model: ~20 minutes generation time

Step-by-Step: Generate Subtitle for Video Files

Step 1: Access the Tool

Navigate to intlpull.com/tools/subtitles/generate

No account or sign-up required.

Step 2: Check Browser Compatibility

Recommended browsers:

  • Chrome/Edge 113+ (best WebGPU support)
  • Firefox 121+ (WebGPU enabled in config)
  • ⚠️ Safari 17+ (WebGPU experimental, slower)

Hardware acceleration:

  • WebGPU available: Uses your GPU for 5-10x faster processing
  • Fallback to WASM SIMD: Slower but still functional on any modern device

The tool auto-detects your browser capabilities and selects the fastest execution method.

Step 3: Upload Your File

Drag and drop or click to upload:

  • Audio formats: MP3, WAV, FLAC, AAC, OGG, M4A
  • Video formats: MP4, MKV, AVI, MOV, WEBM
  • File size limit: 2GB (approximately 10 hours of video)

Video files: The tool extracts the audio track automatically using FFmpeg.wasm. Original video is never loaded into memory (too large).

Step 4: Configure Generation Settings

Language Selection

If you know your audio language, select it from the dropdown:

  • English (default)
  • Spanish (Español)
  • French (Français)
  • German (Deutsch)
  • Mandarin Chinese (中文)
  • Japanese (日本語)
  • Korean (한국어)
  • And 90+ more languages

Why specify language? Whisper performs better when the model knows the expected language. Auto-detect works but is slightly less accurate.

Model Selection

  • IntlPull AI Tiny (English only, faster)
  • IntlPull AI Small (multilingual, better accuracy)

First-time users: The model downloads once and caches in your browser. Subsequent uses are instant.

Output Format

  • SRT (SubRip): Most compatible format, works on YouTube/Vimeo/VLC
  • VTT (WebVTT): HTML5 video players, better accessibility features

See our format comparison guide for details.

Advanced Options

Timestamp Granularity:

  • Word-level (default): One subtitle per phrase (2-5 words)
  • Sentence-level: One subtitle per sentence (better readability but longer display time)

Punctuation:

  • Auto-punctuation (recommended): AI adds commas, periods, question marks
  • Raw transcription: No punctuation (useful for technical transcription)

Speaker Diarization (experimental):

  • Enabled: Attempts to identify different speakers and label them
  • Disabled: All transcription treated as single speaker

Note: Speaker diarization adds ~20% processing time and requires the small model.

Step 5: Generate Subtitles

Click "Generate Subtitles".

What happens next:

  1. Model loading (first time only): Downloads and caches AI model (30-90 seconds)
  2. Audio extraction (video files only): FFmpeg extracts audio track (5-15 seconds)
  3. Audio preprocessing: Converts to 16kHz mono WAV (1-5 seconds)
  4. Transcription: IntlPull AI processes audio in 30-second chunks with progress bar
  5. Post-processing: Formats timestamps, applies punctuation, validates SRT/VTT structure

Progress indicator: Real-time progress bar shows:

  • Current chunk being processed
  • Estimated time remaining
  • Processing speed (real-time ratio)

Performance tip: Close other browser tabs during processing to maximize available RAM and GPU resources.

Step 6: Review and Edit

Once generation completes, the tool displays:

  • Side-by-side preview: Audio waveform + generated subtitles
  • Inline editor: Click any subtitle to edit text or adjust timing
  • Playback sync: Click a subtitle to jump to that timestamp in audio

Common edits needed:

  1. Proper nouns: AI may misspell names, brands, technical terms
    • Example: "open AI" → "OpenAI"
  2. Homophones: Words that sound alike but have different meanings
    • Example: "their" vs "there" vs "they're"
  3. Punctuation: Occasionally misses or adds incorrect punctuation
  4. Line breaks: Adjust for readability (max 2 lines per subtitle)

Editing shortcuts:

  • Tab: Move to next subtitle
  • Shift+Tab: Move to previous subtitle
  • Ctrl+S: Save changes
  • Space: Play/pause audio

Step 7: Download Your Subtitles

Click "Download SRT" or "Download VTT" to save the file.

Filename convention: Automatically appends language code:

  • Original: video.mp4
  • Generated: video.en.srt

Test your subtitles:

  1. Open your video in VLC Media Player
  2. Drag the SRT/VTT file onto VLC
  3. Subtitles should auto-sync and display
  4. Verify accuracy for first 2-3 minutes

If timing is off, use IntlPull's subtitle sync tool to adjust globally.


Tips for Better Subtitle Generation Results

1. Audio Quality is Everything

Optimal audio:

  • Clear speaker voice
  • Minimal background noise
  • Consistent volume levels
  • No overlapping speakers

Problem audio:

  • Heavy music/sound effects
  • Echo or reverb
  • Multiple simultaneous speakers
  • Low bit-rate compression artifacts

Preprocessing tip: If your audio is noisy, run it through a noise reduction filter first (Audacity's "Noise Reduction" is free).

2. Handle Background Music

Whisper sometimes transcribes background music lyrics as speech. Solutions:

  • Music-only sections: Manually delete subtitles during intro/outro music
  • Audio editing: Use an audio editor to duck (lower) music during speech
  • Post-generation cleanup: Use find/replace to remove common music transcription errors

3. Multi-Speaker Content

For interviews, panels, or conversations:

  • Enable speaker diarization if available
  • Manual labeling: After generation, manually add speaker labels:
    SRT
    11
    200:00:01,000 --> 00:00:03,500
    3- Host: Welcome to the show!
    4
    52
    600:00:03,600 --> 00:00:06,000
    7- Guest: Thanks for having me.

4. Technical Terminology and Jargon

Whisper's training data includes technical content, but it may struggle with:

  • Domain-specific acronyms (e.g., "CI/CD" → "CICD" or "C I C D")
  • Product names (e.g., "PostgreSQL" → "Post Gress Q L")
  • Non-English technical terms

Solution: After generation, use find/replace to fix recurring misrecognitions:

  • Find: "post gress Q L" → Replace: "PostgreSQL"
  • Find: "cube control" → Replace: "kubectl"

5. Accents and Non-Native Speakers

Whisper handles accents reasonably well but accuracy drops with:

  • Heavy regional accents
  • Non-native speakers with strong accents
  • Code-switching (mixing languages mid-sentence)

Mitigation:

  • Select the speaker's native language (even if speaking English)
  • Use the larger whisper-small model for better accuracy
  • Budget extra time for manual corrections

6. Long-Form Content (2+ Hours)

Browser memory limits can become an issue with very long videos:

Workaround:

  1. Split video into 30-60 minute chunks
  2. Generate subtitles for each chunk separately
  3. Merge SRT files using a text editor or IntlPull's subtitle merger

Merging SRT files:

SRT
1# Chunk 1 ends at 00:30:00
2# Chunk 2 starts at 00:30:00
3# Adjust chunk 2 sequence numbers to continue from chunk 1's last number
4# Adjust chunk 2 timecodes to offset by +00:30:00

Browser Requirements and Performance

Minimum Requirements

ComponentMinimumRecommended
RAM4 GB8 GB+
CPU2017+ Intel/AMDApple Silicon / Ryzen 5000+
GPUIntegrated graphicsDiscrete GPU (RTX 3060+)
BrowserChrome 100+Chrome 120+ with WebGPU
Storage500 MB free1 GB free (for model cache)

Performance Benchmarks

Generating subtitles for a 10-minute video:

DeviceModelTimeReal-Time Ratio
M1 MacBook Protiny.en60 seconds10x
M1 MacBook Prosmall200 seconds3x
Intel i7-12700 + RTX 3060tiny.en90 seconds6.6x
Intel i7-12700 + RTX 3060small250 seconds2.4x
Intel i5-10400 (no GPU)tiny.en300 seconds2x
Intel i5-10400 (no GPU)small800 seconds0.75x

Real-time ratio: Higher is better. 10x = generates subtitles 10 times faster than video duration.

WebGPU vs WASM Performance

BackendSpeedCompatibility
WebGPU5-10x fasterChrome 113+, Edge 113+, Firefox 121+ (flag)
WASM SIMDBaselineAll modern browsers

Checking your backend: The tool displays "WebGPU acceleration enabled" or "Running on WASM fallback" in the status bar.

Mobile Device Support

⚠️ Mobile devices are not recommended for subtitle generation due to:

  • Limited RAM (tab crashes on large files)
  • Slower processors (generation takes 10x+ longer)
  • Battery drain

Workaround: Use a desktop/laptop browser, or IntlPull's cloud API for mobile subtitle generation.


Privacy: Your Audio Never Leaves Your Device

How It Works

Traditional subtitle services (Rev, Otter.ai, YouTube) upload your audio to their servers:

Your device → Server transcription → Download result

IntlPull's browser-based tool:

Your device → (everything happens locally) → Download result

What This Means

  • No upload: Audio file stays in browser memory, never transmitted
  • No storage: Files are never written to disk (except model cache)
  • No logging: No record of what you transcribe
  • Offline capable: Works without internet after model download

Verifying Privacy

Check network tab (Chrome DevTools):

  1. Open DevTools (F12)
  2. Go to "Network" tab
  3. Start subtitle generation
  4. You'll only see:
    • Model download (first time only, from HuggingFace CDN)
    • No other network requests

Open-source code: IntlPull's subtitle generator is open-source. Review the code at github.com/intlpull/web/apps/web/src/app/tools/subtitles/generate.


Supported Languages (99 Total)

Whisper supports 99 languages with varying accuracy:

Tier 1 (Excellent Accuracy)

English, Spanish, French, German, Italian, Portuguese, Dutch, Polish, Turkish, Russian, Korean, Japanese, Mandarin Chinese, Cantonese, Indonesian, Malay, Vietnamese, Thai, Hindi

Tier 2 (Good Accuracy)

Arabic, Hebrew, Greek, Czech, Slovak, Romanian, Hungarian, Finnish, Swedish, Norwegian, Danish, Ukrainian, Bulgarian, Croatian, Serbian, Catalan, Filipino

Tier 3 (Moderate Accuracy)

Persian, Urdu, Bengali, Tamil, Telugu, Marathi, Gujarati, Swahili, Amharic, Yoruba, Zulu, Afrikaans, Icelandic, Estonian, Latvian, Lithuanian, Slovenian, Albanian, Macedonian, Bosnian, Welsh, Basque

Language-specific notes:

  • Mandarin: Specify "Chinese (Simplified)" for mainland China or "Chinese (Traditional)" for Taiwan
  • Arabic: Struggles with dialects; Modern Standard Arabic works best
  • Code-switching: Mixed languages (e.g., Spanglish) reduce accuracy significantly

What to Do After Generating Subtitles

1. Translate to Other Languages

Use IntlPull's subtitle translator to create multilingual versions:

English.srt → Spanish.srt
            → French.srt
            → German.srt

See our subtitle translation guide.

2. Upload to Video Platforms

YouTube:

  1. YouTube Studio → Subtitles
  2. Select video → Add language → English
  3. Upload file → Select your .srt or .vtt file

Vimeo:

  1. Video settings → Distribution → Subtitles
  2. Add subtitles → Upload file

Wistia:

  1. Customize → Captions
  2. Upload captions → Choose file

3. Embed in Website

For HTML5 video players:

HTML
1<video controls>
2  <source src="video.mp4" type="video/mp4">
3  <track kind="subtitles" src="video.en.vtt" srclang="en" label="English" default>
4  <track kind="subtitles" src="video.es.vtt" srclang="es" label="Español">
5</video>

Note: Use VTT format for <track> element (not SRT).

4. Create Burned-In Subtitles

"Hardcode" subtitles directly into video:

Terminal
ffmpeg -i video.mp4 -vf subtitles=video.srt output.mp4

When to burn in:

  • Social media (Instagram, TikTok) where separate subtitle tracks aren't supported
  • Presentations where you can't guarantee subtitle file availability
  • Legacy video players without subtitle support

Troubleshooting Common Issues

Issue 1: Model Download Fails

Symptoms: "Failed to load model" error

Solutions:

  1. Check internet connection: Models are 77-490 MB
  2. Clear browser cache: Old model versions may be corrupted
  3. Try different browser: Safari sometimes has CORS issues
  4. Disable VPN: Some VPNs block HuggingFace CDN

Issue 2: Browser Tab Crashes

Symptoms: Tab crashes during processing, especially on large files

Solutions:

  1. Close other tabs: Free up RAM
  2. Use smaller model: Switch to whisper-tiny.en
  3. Split video: Process in chunks if file is > 1 hour
  4. Increase browser memory limit: Chrome flags:
    chrome://flags/#max-tiles-for-interest-area
    chrome://flags/#force-memory-limit
    

Issue 3: Subtitles Out of Sync

Symptoms: Subtitles appear too early or too late

Solutions:

  1. Variable frame rate video: Convert to constant frame rate first:
    Terminal
    ffmpeg -i input.mp4 -r 30 -c:v libx264 output.mp4
  2. Audio delay in source: Use subtitle sync tool to offset all timecodes
  3. Regenerate: Sometimes a one-off glitch, try generating again

Issue 4: Poor Transcription Accuracy

Symptoms: Many incorrect words, nonsensical phrases

Solutions:

  1. Specify language: Auto-detect is less accurate
  2. Use larger model: Switch to whisper-small
  3. Improve audio quality: Reduce background noise, boost speech volume
  4. Check language support: Some languages are Tier 3 (moderate accuracy)

Issue 5: WebGPU Not Available

Symptoms: "WebGPU not supported" message, slow processing

Solutions:

  1. Update browser: Chrome 113+, Edge 113+, Firefox 121+
  2. Enable WebGPU in Firefox:
    about:config → dom.webgpu.enabled → true
    
  3. Check GPU drivers: Update graphics drivers to latest version
  4. Fallback to WASM: Still works, just slower

API for Developers

For automated workflows, IntlPull offers a cloud API:

Terminal
1curl -X POST https://api.intlpull.com/v1/subtitles/generate \
2  -H "X-API-Key: ip_live_..." \
3  -F "file=@video.mp4" \
4  -F "language=en" \
5  -F "model=whisper-small" \
6  -F "format=srt"

Response:

JSON
1{
2  "job_id": "job_abc123",
3  "status": "processing",
4  "eta_seconds": 120
5}

Pricing: Free tier (100 minutes/month), paid plans from $0.006/minute.

See API documentation for details.


Comparison: Browser Tool vs Cloud Services

FeatureIntlPull BrowserRev.comOtter.aiYouTube Auto
CostFree$1.50/min$10/moFree
Privacy100% localUploadUploadUpload
Speed2-10x realtime12-24 hours1x realtime5 mins
Accuracy (English)~90%~99%~85%~80%
Languages99English onlyEnglish only100+
EditingBuilt-inManualBuilt-inYouTube Studio
Output formatsSRT, VTTSRT, VTT, TXTSRT, TXTSRT (via YouTube)
Best forPrivacy, cost, speedMaximum accuracyMeetingsYouTube videos

Our take:

  • Confidential content: IntlPull browser tool (privacy)
  • Critical accuracy (legal, medical): Rev.com (human transcription)
  • YouTube content: YouTube Auto + manual cleanup (free, integrated)
  • Scale (1000+ hours): IntlPull API (automation)

Conclusion

Browser-based subtitle generation using IntlPull AI has made professional-quality transcription accessible to everyone. No costs, no uploads, no privacy concerns—just drag, drop, and download.

Try it now: Generate Subtitles from Audio

Once you've generated your subtitles:

  1. Translate to other languages
  2. Convert between formats
  3. Edit and sync with video

For teams managing video content at scale, explore IntlPull's TMS platform with team collaboration, translation memory, and automated subtitle workflows.


Related Tools:

Tags
subtitles
whisper
ai
speech-to-text
subtitle-generator
audio
2026
IntlPull Team
IntlPull Team
Engineering

Building tools to help teams ship products globally. Follow us for more insights on localization and i18n.