Clark Rasmussen -

Video Encoding in the Cloud with ElasticTranscoder

I’ve had a “multimedia” section on DetroitHockey.Net since about the third day the site existed but I haven’t always done a good job keeping said multimedia in a usable format. For awhile all of the videos were in QuickTime format, then I jumped over to Windows Media.  There were whole seasons of hockey where I didn’t bother adding any new highlights to the site because first I couldn’t figure out what the best format to use would be, then I didn’t want to take the time to burn through the backlog of highlights I needed to edit, encode, upload, etc.

Dumping all of the videos off on to YouTube wouldn’t be an option for me because I try to own the entire end-to-end experience on DH.N.  I don’t want to dump people off on a third-party to get something that I am supposedly providing.

About 18 months ago I finally sat down and took on the challenge of bringing the multimedia system up to date.  I pulled up the raw capture files for every video I had, including my backlog.  I re-edited everything, re-watermarked it, and re-encoded it all in HTML5-friendly formats.  I did it all by hand because I wanted to keep an eye on things as they went along but the entire time I was thinking, “I need to automate this going forward.”

After the updated multimedia section launched I had the idea of setting up FFmpeg on my server and using it to do the encoding rolling around in my head for awhile.  The idea would be that I’d upload the edited video and have automated processes that encoded it to the right formats, added the watermark, and copied everything to the right place.  I never got around to that before Amazon Web Services put out their ElasticTranscoder service.

Put simply, ETC does what I wanted to use FFmpeg for.  Here’s a bit on how I’m now using it.  All code shown is PHP (as should be evident) using AWSSDKforPHP2.

There are three important concepts to using ETC.  The “job” defines what file is to be encoded and what preset(s) to use for the encoding.  The “preset” is the set of encoding parameters, including codec options. The “pipeline” defines where your files come from and what gets notified upon various job-related events, including completion and error.

There’s one caveat to that: watermarking.  Watermarks are defined in the job but the preset must be set up to allow for watermarks.  I Tweeted that I think it should only be on the job level or should be a fully-defined fourth concept (along with the pipeline, preset and job) that gets attached to a job, rather than splitting the definition across the preset and the job.  That said, the way I ended up implementing things it doesn’t matter.

The pipeline is the one constant for every DH.N encoding job.  I dump the edited video file into a specified S3 bucket (all input files for ETC must be in an S3 bucket) and the job puts the completed files back in that bucket.  Upon completion or error, ETC sends a message to a topic in the AWS Simple Notification Service.  I have an HTTP endpoint subscribed to that topic, where code runs to shuffle the completed files into their final locations. I should probably be using an SQS subscription instead of an HTTP one to reduce the possibility of data loss but I’m not right now.

So dump the file to S3 and kick off the job, right?  I’ve skipped setting up the preset and here’s why:

Presets in ETC can’t be edited.  Once you create one, that’s it.  So I could create a set of presets that work for what I need now and save all of their IDs for reference by my scripts but if I ever needed to change that preset I would have to create a whole new preset and then update my script with the new ID.  Not hard but it felt like an easy point to make a mistake.

Instead, since I’m programmatically firing off the job anyway via the AWS API, I fire off the command to create a new set of presets for each job first.  This limits me because you can only have 50 custom presets but I clean up after each job so it really just means I can only have a certain number of jobs active at a time.

Enough of my babbling, on to the code:

As the comment says, the watermark and thumbnail-creation data is the same for all output file formats so I define that first.  You’ll notice that the actual watermark file isn’t defined there, that’s defined at the job level, which is one of the things I think is weird about how watermarks are handled.

Then I create my MP4 preset, defining all of the codec options and other variables.  The big thing here is that I’m using a higher bitrate for HD video than SD, so I define that on the fly.  I’m sure I could do more fine-tuning but I’ve forgotten more about video codecs than Brian Winn at MSU would like for me to admit.

With the preset definition built, I fire off the createPreset command, which spits back a ton of data including the newly-created presetID.  I save that ID for later.

I also create a WebM preset but I’ll save space and not include that here since it looks almost the same.

With the presets defined, it’s time to fire off the job.

The pipeline ID is saved off in a configuration file, since that’s used for every job.  The source file and the output folder are defined in code outside this.  We loop through each of the previously-defined presets to say how that preset will be used (for example, I don’t use thumbnails on any of my jobs, even though they were defined at the preset level).  If I’m using a watermark, the watermark file is added to the job definition.  With the job defined, the createJob command is fired.

The job runs along in the background and I don’t care about the status, because I’ll know when it ends because my HTTP endpoint will be hit.  The endpoint looks like this:

We start by determining whether or not the job is complete. Because we’re only notified upon completion or error, we know that anything that isn’t completion is a failure.

On completion we loop through the data provided about the completed job to determine what presets were used and what files were created.  We move the new files to their final locations (and make them publicly readable) and remove the outputted files and the original input file.  On error we just determine what presets were used (we don’t delete any files in case they can be re-used).  In both cases, we then remove the presets that were used so that we don’t hit that 50-preset limit.

There are some other pieces that manage metadata and connections to other parts of DH.N but these are the interactions with ElasticTranscode.

PHP and the Trello API

Note: This post was updated on March 4, 2021, to replace references to outdated code.

A couple weeks ago I wrote up a bit about a PHP wrapper object I’d written for the Twitter API v1.1.  Since then I’ve been playing with the Trello API a bit, so I figured I’d write that up as well.

The code for the Trello API class is going to look really familiar because I actually wrote my Trello API wrapper object first, so the Twitter one is based on it.

Say you wanted to use this to get the name of your board’s red label. That would look like this:

We use the request method make a GET call to /1/boards/xxxxxx, where xxxxxx is the ID of the board we want the data for. We include the optional fields=labelNames query because the names of the labels are all the data we want to get back in this case. The request gives us back an object where each label’s name is available.

For the record, I kind of hate that Trello uses the color of each label as an identifier. I’m sure they can back the decision up but it reminds me of the old CSS “rule” about not naming your classes after what they look like, because what happens if you change what they look like?  If you have a class that converts everything to uppercase and bolds it, and you call it uppercase_bold, that works great until you change it to small caps and italics.  That’s a whole other post, I suppose.

Now that we have the text for the red label on our board, say we want to change it. That looks like this:

This time it’s a PUT request to /1/boards/xxxxxx/labelNames/red (where, once again, xxxxxx is the board’s ID).  We use the optional $args argument of the request method to pass in a value of $text, where $text is the new label text.

What if you want to use that newly-renamed red label and apply it to a new card? That’s just a POST request to /1/cards.

We use the optional $args argument again to pass in our array of arguments. We set the name to whatever the value of $card_name is,  idList is set to $list_id (the ID of the list the card will be added to), and the optional labels specifies that we’re applying just the red label to this new card.

Again, I could complain about using the color as an identifier but whatever.

Like my work with the Twitter API, this is hardly groundbreaking stuff.  Just another thing I thought might be useful for people other than myself so I wrote it up.

Video Games and Charity

I’ve written about one of my favorite charity events on DetroitHockey.Net before but with Mario Marathon on it’s sixth iteration this weekend, I thought I’d say something here.

Mario Marathon features a group of guys playing their way through the core Super Mario video games, streaming their efforts online in a telethon-like fundraiser for Child’s Play Charity.  They’re about thirty hours into this year’s event and they’ve raised over $30,000.  That puts them at about $378,000 raised over the last six years, with most of this year left to come.

Child’s Play raises money to buy toys, books and video games for children’s hospitals across the world in an effort to provide entertainment for the kids that have to stay there.

In the last fifteen months, my family has spent more time in women and children’s hospitals than we’d ever wish upon anyone and we know we didn’t have it nearly as bad as many do.

The staff at these places are phenomenal but there’s only so much they can do.  No one wants to spend time there but as adults we can justify it.  We know the hospital is the best place to get the treatment we or our loved ones need.

Children don’t always know this, though. They only know that they have to be in a place that can be scary at the best of times and most certainly is not home.

Child’s Play brings a bit of home to the hospital and gives kids a chance to relax as they go through their treatment.  It’s a noble cause and the Mario Marathon guys to a great job in support of it.

PHP and the Twitter API v1.1

Note: This post was updated on March 4, 2021, to replace references to outdated code.

I’d been using the relatively-awesome TwitterOAuth library by Abraham Williams for quite some time to handle interactions between my sites and Twitter’s REST API v1.  With Twitter having eliminated v1 of the API, I started looking into other options.

It’s true that TwitterOAuth can be updated easily, changing a hardcoded 1 to 1.1 but Twitter introduced a new wrinkle with the move to v1.1 that v1 didn’t have: All requests must be authenticated.

This makes sense for actions such as posting a new Tweet, as you can’t very well do so without having a user to Tweet on the behalf of.  For that reason, you had to be authenticated to make that request in v1, so it’s nothing new for v1.1.  But what about if you just want to get the timeline of a specific user, or data on a specific Tweet?  Those are actions you might want to do through an automated process, in which case there would be no logged-in user to act on the behalf of.

Well, that’s what bearer tokens are for.  And TwitterOAuth doesn’t handle them.  So rather than use TwitterOAuth for one set of requests and something else for others, I wrote a new class that can handle Basic, OAuth, and Bearer auth types.

You’ll never hear me say that this is some kind of end-all, be-all solution.  I’m not even sure it’s all that good.  It just appears to solve the problem I was trying to handle and since I didn’t see a lot of code that did, I figured it might be useful to post.

Some more details on how this works…

This is built to handle the kind of Basic auth requests you would need to make in order to get an OAuth or Bearer token to continue making requests.  After you get your bearer token, you can switch to using that.  Then you could switch to OAuth or back to Basic if you needed to.

Here’s an example:

We start by initiating the object using our application’s consumer key and consumer secret (no getting around that).  Because that’s all we have at that point, we use them with Basic auth to make the request for a bearer token.  That request is a POST to oauth2/token with a body of grant_type=client_credentials.  The third parameter of my request function is for any arguments for the API call and we have none for this one, so it’s set to null.

That request spits back an object that includes a bearer token, so we save that as $bearer_token for future use.

Our next request is for data on a specific Tweet.  We need OAuth or Bearer auth for that so we use the auth function to feed in the bearer token we just got.  That function will also switch us over to using Bearer auth for all of our subsequent requests.  With that out of the way, we use request again, this time hitting 1.1/statuses/show.json with a GET request.  Unlike in our previous call, we have optional parameters to use (but no body, so it can be ignored).  Our parameters will be passed in as an array, with the Tweet ID defined and include_entities set to true.

That request will return the data for the Tweet we specified.  Since we’re not doing anything with it in this example, we just spit it back out on the screen.

Since this example is done, we close it out by invalidating the bearer token we just created.  You probably would actually want to save that token to reuse it within your application but we destroy it for example’s sake.  To do that, we use set_auth_type to switch back to Basic auth, then we POST to oauth2/invalidate_token with a body of access_token=XXXXXXX (where XXXXXXX is the bearer token we got earlier).

For the record, had we wanted to make a request that required a user’s OAuth authorization, it would have looked like this:

Where $text is the text to be Tweeted, of course.

As I said, this isn’t any kind of end-all, be-all.  It doesn’t have any kind of error handling, I’ve only tested it on the things I was already using the Twitter API for, and I’ve only tested it on my own machines.  It works for me, though, so I figured I’d throw it out to the world in case it might work for someone else.

S/T: There’s a great answer on StackOverflow about manually building the OAuth headers that really helped me out in this.

The Rebellion Strikes Back?

I’m a big alternate history geek and I recently stumbled onto a thread at asking “What if Mark Hamill had been killed in the car accident that occurred between the filming of Star Wars and The Empire Strikes Back?”

“Gone The New Hope” covers more than just the possible development of the Star Wars saga, branching out to feature the business of Hollywood and politics.

The reason I bring it up on the anniversary of the release of The Empire Strikes Back is that the the author recently reached May 4, 1980, in his timeline – the release date of his alternate Episode V, titled The Rebellion Strikes Back.

The author had previously noted that he had “accidentally” ended up writing an entire treatment for the second film of the series.  He posted the complete first act on Sunday.

In this world, Luke Skywalker sacrificed himself to destroy the Death Star while Obi-Wan Kenobi was able to escape with his life.  Han Solo is the main character going forward, with edits made to A New Hope to give him more character development.

Personally, I think it’s too awesome not to share.  It brings in a lot of unused ideas from previous drafts of what became our Star Wars movies, as well as ideas that would later be used in the sci-fi of our timeline.


Output From Snagit to AWS S3 via S3 Browser

This is something I put together and never wrote up, so I might as well do so now.

Snagit by TechSmith is one of my favorite programs; I love being able to share what I’m seeing on my screen with collaborators and clients for easy comparisons with what they’re seeing.

With Snagit, there are several built-in output methods. Until recently, I mostly exported my screenshots to TechSmith’s service or to an online filebox I had built for myself on via the built-in FTP functions. When I updated my filebox to use the Amazon Web Services S3 service, though, I lost the ability to upload files there straight from Snagit.

There is no Snagit to S3 output available but I found a way to do it via a Snagit program output and the S3 Browser program.

Snagit allows for outputs via the command line and S3 Browser has a command line mode, so the two can combine to allow you to upload files.

Let’s say you have an account in S3 Browser called myaccount and under that, you had a bucket called For added complexity you have a folder in that bucket called snagit where you want your outputted screenshots to live (though this added step can be skipped if you want to save straight to the bucket root). In that case, you can set up a Snagit program output with the following settings:


When you share using the S3 Browser Console Uploader output, your screenshot goes straight into your S3 bucket and you can share the URL as necessary.

That’s useful but I think it’s missing a step. When you share to or via FTP, the file’s new URL is copied to your clipboard, so you can just paste the new URL into whatever communication medium you’re using. That doesn’t happen here, you have to manually type out the new URL, but it can with another intermediary step.

I wrote a batch script that acts as a go-between for Snagit and S3 Browser. It takes in an additional parameter and uses that to assemble the new file’s URL, then copies that to the clipboard. The batch script looks as follows:

Disclaimer: I hadn’t written a batch script in ages prior to this, so this can probably be cleaned up a bit.

I save that script as snagit-s3browser.bat inside the S3 Browser root folder. My updated Snagit program output to take advantage of it looks as follows:


Since it’s truncated in the screen capture, the full Parameters line is as follows:

myaccount "<CaptureFilename>"

This is similar to the first one except we’re pointing at the batch file rather than the S3 Browser command line utility directly. We’ve also added a fourth parameter that contains the base URL for your capture location, which maps to the bucket/folder combination used in the third parameter.

There’s no visual confirmation with this output but now when your share is complete, the file will have been uploaded and the new URL will be on your clipboard.

There might be a better way to do this but it’s been working pretty well for me. I’d love to hear if there are others solving the same problem differently.

Friday Morning Irony

A friend was just installing Ad Block Plus on a new machine and sent the following screenshot to me:

AdBlock Plea Screenshot

There’s something I find highly ironic about the maker of software that limits publishers’ ability to make money asking for donations to continue doing so.

There are a lot of sites out there that use horrible pop-up ads or ads that autoplay with sound, I’m not denying that.  There are also a lot of sites that use simple, unobtrusive ads, providing a revenue stream for the site that – in the worst-case scenario – provides no use to the user but also does not do any harm.  The latter allows for a bit of a symbiotic relationship between the user and the publisher; the publisher provides content that the user wants and the user may or may not click on ads while viewing that content, providing income back to the publisher.

As I said, there are some aggressive publishers out there, but it’s funny to me that someone who makes it possible to take away income from well-meaning publishers is begging for money to do it.

Wednesday SQL Fun

Just had an interesting SQL question dropped in my lap, figured I’d share the results since it was something I assumed was possible but actually ended up working exactly as I expected, for once.

A friend has a system that he uses to track what baseball uniforms were worn in each game.  Each game has an ID and (among other things) the game date, the home score, the road score, the home uniform ID and the road uniform ID.  From that data, he wants to be able to pull up the record of a team in a given uniform over a period of time.  One particular caveat is that a jersey could be worn by either the home team or the road team (in the case of alternates that can be worn in either location).

This is how I ended up doing it:

[sql]SELECT (
FROM game_log
(home_jersey_id = 84)
AND (home_score & gt;road_score)
game_date BETWEEN ‘2010-01-01’
AND ‘2013-01-01’
) + (
FROM game_log
(road_jersey_id = 84)
AND (road_score & gt;home_score)
game_date BETWEEN ‘2010-01-01’
AND ‘2013-01-01’

FROM game_log
(home_jersey_id = 84)
AND (home_score & lt;road_score)
game_date BETWEEN ‘2010-01-01’
AND ‘2013-01-01’
) + (
FROM game_log
(road_jersey_id = 84)
AND (road_score & lt;home_score)
game_date BETWEEN ‘2010-01-01’
AND ‘2013-01-01’


For some reason, I’d never thought about using subqueries in that way before but it just clicked to do so in this case.  I kind of wonder now where else I could do that.

Need Moar Blogs

I’ve been blogging about hockey since before “blog” was a word but I figured it was time for me to have a place to write about other stuff.  I don’t know if this will be a common occurrence or anything but it’d be nice to have somewhere to babble about development, if nothing else.

Speaking of…  Kicking off this blog is part of a larger redevelopment of the site that’s been a real interesting learning experience.  I rolled my own blog system for DetroitHockey.Net so I hadn’t played with WordPress a whole lot before this.  I’ll try to write up what I learned once it’s done.