Grab BBC News Videos with a Bash Script

A colleague asked me the other day if I could download a video clip from the BBC News website so that she could use it in her training course (to save having to load up the website and play it “live” from there!)

I said anything is possible πŸ˜‰ Β Took me on a little journey but finally found a way without having to resort to any browser plugins or switching to Windows to download a video grabber program. Decided to write a bash script to pull all the elements together too :)

In simple terms it was a matter of finding the “pid” or externalId for the video on the page source, loading this into another url to generate an xml file that revealed a partial url to the video file, then taking this and tacking on another part of url to make up the download link. A bit of googling on efforts by others helped me out with the parts I needed.

BBC are constantly changing how they do things, so this solution may not last long or work in all situations but in testing it worked for me from a variety of locations where there was just the one video on a page. The bash script means this can all be handled from the cli with a single command and by feeding the web page.

Here is the script: [EDIT: this has stopped working now due to changes on the bbc site, see comments below for progress on a fix]
#!/bin/bash
# feed this script with a bbc newsite page with a video on it to download the video at 1500K
# will download the first video

#downloads the source for the web page with the video on it
content=$(wget $1 -q -O -)

#finds the pid for the first video
extId=$(echo “$content” | grep -Po ‘(?<=externalId”:”).+?(?=”,”caption)’)

#grabs the xml file with the download links
geturl=$(wget http://www.bbc.co.uk/mediaselector/4/mtis/stream/$extId -q -O -)

#tidies xml file to start grep for the 1500k link
find1500=${geturl#*384}

#finds the link for the 1500K file
videourl=$(echo “$find1500″ | grep -Po ‘(?<=identifier=”mp4:public/).+?(?=” kind=)’ | head -1)

#downloads video
wget http://news.downloads.bbc.co.uk.edgesuite.net/$videourl

Usage:

Ensure executable permissions are set on the script file
from the directory the script is in:

./videoscript.sh “bbcnewsurlhere”

 

Revised script may need further fettling but worked when tested

bbcnews

15 thoughts on “Grab BBC News Videos with a Bash Script

    • Thanks for the heads up. Auntie Beeb seems to have changed things around again. I am halfway to a fix!

      replace the sed command in the second part of the script:

      extId=$(echo "$content" | grep -Po '(?< =externalId":").+?(?=","caption)')

      with

      extId=$(echo "$content" | grep -Po '(?< =(vpid":")).*(?=","live)')

      That grabs the pid OK

      But the downloads server on the last line is not responding (even though it is there!) It is either blocked or having some time off, I'll have to do some more research or wait a while

    • Well, after a few hours of googling and a lot of brute force / trial and error on the command line πŸ˜‰ I found a solution / workaround :) You’ll need to have rtmpdump installed to get it to work. I had to use a bit more information from the mediaselection xml file, including the auth code (which I believe is peculiar to your IP address). As a test I used this video:

      http://www.bbc.co.uk/news/world-asia-china-34608746

      With the pid p035xkhf

      which generated this (partial and edited to make it show up on the blog!) info in the mediaselection xml

      bitrate="512" encoding="h264" expires="2099-01-01T00:00:00+00:00" height="224" kind="video" media_file_size="12992143" service="journalism_ukpublic_stream_h264_flv_400" type="video/mp4" width="400"
      connection application="ondemand" authExpires="2015-10-25T01:03:21+00:00" authString="auth=daEcedcb_b2bzdMaZcWaWdjcFaibHdpa_bz-bwla8T-bWG-FqnEDoAoKEvEoyK&aifp=v001&slist=public/mps_h264_400/public/news/world/1202000/1202251_h264_512k.mp4;public/mps_h264_hi/public/news/world/1202000/1202251_h264_1500k.mp4;public/mps_h264_med/public/news/world/1202000/1202251_h264_800k.mp4;public/mps_h264_200/public/news/world/1202000/1202251_h264_176k.mp4" identifier="mp4:public/mps_h264_400/public/news/world/1202000/1202251_h264_512k.mp4" kind="akamai" priority="5" protocol="rtmp" server="cp45413.edgefcs.net" supplier="akamai"

      from that I created this command:

      rtmpdump -r "rtmp://cp45413.edgefcs.net:1935/ondemand?auth=daEcedcb_b2bzdMaZcWaWdjcFaibHdpa_bz-bwla8T-bWG-FqnEDoAoKEvEoyK&aifp=v001&slist=public/mps_h264_hi/public/news/world/1202000/1202251_h264_1500k.mp4" -o test.mp4

      which downloaded the video for me to a file called test.mp4 :)

      Give it a try at your end?

      In the meantime I will work up changes to the script to accommodate all of this!

  1. Here is the revised script: I had to update it on review to show the &quot; and &amp; properly, you may need to edit these out here, or use the crayon code above

    #!/bin/bash
    # feed this script with a bbc newsite page with a video on it to download the video at 1500K
    # will download the first video

    #downloads the source for the web page with the video on it
    content=$(wget $1 -q -O -)

    #finds the pid for the first video
    pid=$(echo "$content" | grep -Po '(?< =(vpid&quot;:&quot);).*(?=&quot;,&quot;live)’)
    if [ -z $pid ]; then
    pid=$(echo "$content" | grep -Po '(?< =(vpid":")).*(?=","live)')
    fi

    #grabs the xml file with the download links
    geturl=$(wget http://www.bbc.co.uk/mediaselector/4/mtis/stream/$pid -q -O -)

    #tidies xml file to start grep for the 1500k link
    find1500=${geturl#*384}

    #fetches the auth code from the xml
    authcode=$(echo "$find1500" | grep -Po '(?< ="auth=).+?(?==v001&amp;slist)' | head -1)

    #fetch server from xml
    server=$(echo "$find1500" | grep -Po '(?< =server=").+?(?=" supplier=")' | head -1)

    #finds the link for the 1500K file
    videourl=$(echo "$find1500" | grep -Po '(?< =identifier="mp4:).+?(?=" kind=)' | head -1)

    #downloads video
    rtmpdump -r "rtmp://$server:1935/ondemand?auth=$authcode=v001&slist=$videourl" -o $pid.mp4

  2. I got:

    /Users/user/Desktop/sc/script.sh: line 30: unexpected EOF while looking for matching `”‘
    /Users/user/Desktop/sc/script.sh: line 31: syntax error: unexpected end of file

    I did replace all the ‘&quot’ with ‘”‘ and ‘&amp’ with ‘&’ – was that correct?

  3. Tried your script for some old Archived BBC stuff, which I wanted to download:
    http://www.bbc.co.uk/archive/great_egg_race/

    Old flash files I think, but just got:
    : No such file or directoryw.bbc.co.uk/archive/great_egg_race/10803.shtml
    ./BBC.sh: line 31: unexpected EOF while looking for matching `”‘
    ./BBC.sh: line 32: syntax error: unexpected end of file

    :(

Leave a Reply

Your email address will not be published. Required fields are marked *