submit to programming reddit

(March 2012)

Offline playback of Youtube videos and their annotations

Fork me on GitHub

So you just watched a nifty video on Youtube... it's really cool, so you proceed to download it (via youtube-dl) and keep it in your offline video library. But... the video is also annotated; i.e. it displays these "subtitle-like" text balloons while it plays, and these are a big part of why it is actually worth watching...

How can you save these, too? Resort to drastic measures, like recording your desktop while watching...? Surely there must be a better way - after all, these annotations are plain text that is "splashed on" during playback... Isn't there a way to keep both "sources" (i.e. the video and its annotations data) and play them back "in tandem"?

Different scenario: you are careless, like me:

It's my fault, of course - I assumed that if Vimeo can handle it, so would Youtube - but I was wrong. How can I salvage the effort I spent on my video annotations, and use it with my high-res video (which I still have on my hard drive)?

Open source tools to the rescue...

Getting the annotations

Googling a bit, I soon find out about this:

$ wget -O annotations.xml \

...where VIDEO_ID is the video identification part from Youtube videos, i.e. the part in red in the link below:

Update, October 2013: It seems this service has moved now to...

$ wget -O annotations.xml \

So I try it; and review my video's annotation data...

<?xml version="1.0" encoding="utf-8"?>
<document latest_timestamp="1330014529506131" polling_interval="30">
  <requestHeader video_id="o0BgAp11C9s" />
    <annotation author="ttsiodras" id="annotation_150202" 
        style="anchored" type="text">
      <TEXT>The most important parts for C/C++ coding...</TEXT>
        <movingRegion type="anchored">
          <anchoredRegion d="0" h="47.77" sx="17.68" sy="54.72" 
            t="0:02:42.1" w="96.54" x="2.64" y="4.72" />
          <anchoredRegion d="0" h="47.77" sx="17.68" sy="54.72" 
            t="0:02:51.7" w="96.54" x="2.64" y="4.72" />

This file contains all the annotation data I need:

Ingredient 1 for the soup: annotation data - check.

(if my annotations were "subtitle-like" - i.e. small one liners, that can just be placed at the bottom of the screen - then this data would be enough; I would code a simple xml2srt filter, that would create an .srt subtitle for my video (MPlayer can use .srt during playback). Alas, my annotations are more complex; each one is to be displayed at a different rectangular area in the video, so more work is needed...)

Displaying text during playback

Opening up the manpage of the awesome MPlayer, I review the list of video filters, looking for one that allows me to display stuff during playback - and I notice bmovl:

bmovl: The bitmap overlay filter reads bitmaps from a FIFO and displays
       them on top of the movie...

Spot on! Time to try this out: I setup a FIFO, and try sending MPlayer a box of RGB noise, just to see it while playing:

$ cp /path/to/snapshot.png . # this is a 1024x768 snapshot
$ mkfifo bmovlFIFO
$ # Tell MPlayer to play the snapshot over and over (phony video)
$ # and to read bmovl overlay bitmap data from the bmovlFIFO
$ mplayer -vf bmovl=0:0:bmovlFIFO mf://snapshot.png -loop 0 >/dev/null 2>&1 & 
$ sleep 3 # wait a bit for MPlayer to start
$ # Now prepare and send a rectangle of noise of 517x58x4 =
$ # (times 4, for RGBA: one byte for each component) 119944 bytes
$ dd if=/dev/urandom of=box.rgba  bs=1 count=119944 >/dev/null 2>&1
$ echo 'RGBA32 517 58 100 100 0 1' > bmovlFIFO
$ cat box.rgba > bmovlFIFO
$ # At this point, I should be seeing a rectangle
$ sleep 10

Instead, I see this distorted image in the MPlayer window:

bmovl distortion in current MPlayer (2012/03)
bmovl distortion in current MPlayer (2012/03)

This looks like an off-by-one error - each new scanline moves one pixel to the left, so the box ends up "tilted". Surely this is a bug, so I report it to the MPlayer folks and open a Bugzilla ticket.

A day later, a gentleman replies - he indicates that a related patch is in the pipeline for inclusion in MPlayer...

I checkout the latest MPlayer from the official repos, apply the patch on it...

It works! A perfect rectangle is shown during playback!

Ingredient 2 for the soup: showing bitmaps inside MPlayer's window during playback: check.

Text to image

There's only one final ingredient missing: I need to convert my TEXT into nice bitmap rectangles, that will be sent to MPlayer over the bmovl FIFO... Well, I use ImageMagick for most of my image processing - it has never failed me before...

$ cat > sampleText
But that's not all - it is also vital to be able to navigate across 
macro, variable, type, function and class declarations - by using ctags, 
I can instantly jump to the relevant declarations by placing the cursor 
on something and hitting "Ctrl-]" (going to the declaration of 
CallTypeLevel1). I use it again to go deeper (to the declaration of BUF_TYPE) 
and go back to where I was, by "Ctrl-t"
$ convert -trim -size 517x358 -pointsize 19 -depth 8 \
    -fill black -background orange caption:@sampleText box.png

And as ever, it works its magic - giving me this:

TEXT converted to bitmap
TEXT converted to bitmap

I improve the output a bit, using some ImageMagick-foo:

$ convert -bordercolor orange -border 15 box.png annotation.png
$ convert annotation.png -fill gray50 -colorize '100%' -raise 8 \
    -normalize -blur 0x4 light.png
$ convert annotation.png light.png -compose hardlight \
    -composite finalAnnotation.png

Now it gives me this:

TEXT converted nicely to bitmap
TEXT converted nicely to bitmap

Final ingredient for the soup: text to bitmap: check.

Time to enter the Python kitchen... :‑)

Putting it all together in a Python script

The end-user steps:

  1. The user downloads his video from youtube (via youtube-dl or whatever other Youtube downloader):

    $ youtube-dl -o vimPower.flv ''

  2. The video's annotation data are next:

    $ wget -O annotations.xml ''

  3. Then he runs my tiny Python script:

    $ annotations.xml vimPower.flv

The script then...

This is the core of my script's main() function:

width, height, fps = DetectVideoSizeAndLength(sys.argv[2])
childMPlayer = CreateFifoAndSpawnMplayer()
annotations = parseAnnotations(sys.argv[1])
startTime = time.time()
fifoToMplayer = open("bmovl", "w")
for bt in sorted(annotations.keys()):
    annotation = annotations[bt]
    nextTimeInSeconds = getTime(annotation._t0)
    CreateAnnotationImage(annotation, width, height)
    currentTime = time.time()
    if not SleepAndCheckMplayer(childMPlayer, startTime+nextTimeInSeconds-currentTime):
    renderArea = SendAnnotationImageToFIFO(annotation, width, height, fifoToMplayer)
    nextTimeInSeconds = getTime(annotation._t1)
    currentTime = time.time()
    if not SleepAndCheckMplayer(childMPlayer, startTime+nextTimeInSeconds-currentTime):
    SendClearBufferToFIFO(fifoToMplayer, renderArea)

Here's the script: it works for my VIM video, and I have also tested it on a few other Youtube videos. You can see the results in a Vimeo video with my VIM advocacy. If you do decide to use this script, remember that you must also patch your MPlayer, since the bmovl filter is currently (2012/03) broken.


profile for ttsiodras at Stack Overflow, Q&A for professional and enthusiast programmers
GitHub member ttsiodras
Index  CVUpdated: Sat Oct 8 11:41:25 2022

The comments on this website require the use of JavaScript. Perhaps your browser isn't JavaScript capable or the script is not being run for another reason. If you're interested in reading the comments or leaving a comment behind please try again with a different browser or from a different connection.