One of the major advancements we’ve made for Tech Field Day in 2011 was streaming and posting video of all presentations. For all events in 2012, we will be posting video to both YouTube and Vimeo, and we’ve had great viewership numbers at both sites. I know the viewership numbers because I have developed a custom Google spreadsheet that automatically imports video statistics and produces a slick graph.
It isn’t easy automatically and correctly to grab statistics from many sites, but both YouTube and Vimeo expose this data. The challenge is in correctly parsing and formatting this data for use. Here’s how I do it, and I hope that this helps you with your video statistics collection!
As I have explained previously, Google spreadsheets (as well as Windows versions of Microsoft Excel) have some handy live data import functions. My favorites are importHTML and importXML, both of which will parse and expose the content of webpages. In this example, I am using importXML to parse both real XML and HTML, since Vimeo has an XML API and YouTube does not (that I am aware of).
Collecting Vimeo Statistics with the Simple API
Vimeo has two API’s, but the Simple API provides all the information we need without authentication and other troublesome work. It’s actually very simple if you have collected a set of videos in an album. All you need to do is reference a special URL with the importXML command and you will get all the statistics and information you need.
For example, my latest Networking Field Day videos are all collected in the following album: http://vimeo.com/album/1888172
Every album on Vimeo is exposed as XML data through the simple API. Just substitute the name of your album for mine in a URL that looks like this:
If you open that in your browser, you can see that it includes nearly everything you might want, from video titles and numbers to URLs and play counts. Alas, it does not include loads, finishes, or geographic data. But it’s pretty good.
The Google spreadsheets importXML command has no trouble parsing misinformation. Just create a new spreadsheet with the following content in cell A2:
This will fill columns A through W with information about the videos in this album. The reason I put this in cell A2 is that I like to have usable headings in row 1! Here’s the official names for all the columns of output:
id title description url upload_date mobile_url thumbnail_small thumbnail_medium thumbnail_large user_name user_url user_portrait_small user_portrait_medium user_portrait_large user_portrait_huge stats_number_of_likes Number of Plays stats_number_of_comments duration width height tags embed_privacy
One more thing: Vimeo only includes 20 results using the simple API. If you have more videos in this album, as I do, just use the “page” parameter with another importXML command. For example, cell A22 on my spreadsheet contains the following formula:
Extracting YouTube Statistics from a Playlist
I am not aware of an equivalent simple API for YouTube, though I imagine there is a complex API available for developers. Instead, I’m going to use the importXML command to read the content of an HTML page.
Once again, we are using a container to organize our videos. On YouTube, I used a playlist to collect all of the Networking Field Day videos. The following URL will show sufficient information on videos for the purposes of this exercise:
The importXML function allows us to parse HTML as XML. We are going to key off the “span” tag with the class of “video-info”. ImportXML will output the title and “views” number in 2 columns on our spreadsheet. Here is the command for my playlist:
Once again, I placed this command in cell A2 and put my headings in row one.
One major issue with collecting data in this way is the “corruption” of longer titles with ellipses and “by StephenFoskett”. To handle this, I am creating a “short title” for each video that truncates them just before the elipsis. I’m using the following function in column C to do just that:
The “17” refers to the length of the phrase, “by StephenFoskett”, and I’m truncating after 50 characters to be nice and even.
I also need to remove the word, “views” and convert column B into a number. This is accomplished with the following simple function:
Converging and Correlating YouTube and Vimeo Data in Google Spreadsheets
In both cases, Google spreadsheets will automatically update the YouTube and Vimeo data sheets every time the workbook is opened (though not immediately if it has just done so). But I wanted to correlate both so that I can create a nice graph showing statistics from both sites. To do this, I created another spreadsheet that correlates the two sources of information.
Column A of my spreadsheet contains the full video title, inserted as simple text. It is important that Vimeo and YouTube have exactly the same title for every video! I am using text keys and the vlookup function so that I can sort the output to make a prettier graph.
Column E of my spreadsheet generates the correct short title with a simple “left” function. This allows me to match the full Vimeo name with the shortened YouTube name. My function is as follows:
Column F extract the raw Vimeo numbers using the full video name as the key:
I then pass this output through an “iserror” function to catch any undefined values and placed this information in column B for use in my graph:
Column G matches the short title to the view count from my YouTube spreadsheet:
As is the case for Vimeo, I use the “iserror” function to move the final YouTube number into column C:
Column D contains a simple sum of the Vimeo and YouTube data. I sort this “Z to A” so that my graph will have the most watched video at the top.
I can now create a beautiful graph using columns A, B and C. I’m using a stacked bar chart to show the output. Google spreadsheets has remarkably limited graphing flexibility, but I feel that this particular chart is acceptable for presentation. I do wish they would improve in this area, however.
I hope this helps you to see how you can extract data from Vimeo and YouTube and use that data in your own Google Spreadsheets. Leave me a comment if you have any suggestions or corrections, or even just to share your own experiences.