Rob Conery put out a challenge yesterday to some of the REST fans he knows to help show him what a REST API for his video learning site Tekpub would look like. I enjoy API design so I thought I’d take him up on the challenge.
I’m going to take a little different approach though. I’m not setting out to design the best REST API. My goal is to design the best Tekpub HTTP API. There may be some things borrowed from REST principles, but I’m not interested in designing any API to conform to REST just for REST’s sake. Also, before you get all HATEOAS on me, read the post through then read the section on Hypermedia at the end.
That all out of the way, if you’re looking for a by-the-book REST or Hypermedia API design course, you’ll have to wait for some of the others to chime in.
For reference, here are the use cases Rob lays out for his API:
- Logging In. Customer comes to the app and logs in with email and password. A token is returned by the server upon successful authentication and a message is also received (like “thanks for logging in”).
- Productions. Joe User is logged in and wants to see what he can watch. He chooses to browse all productions and can see on the app which ones he is aloud to watch and which ones he isn’t. He then chooses to narrow his selection by category: Microsoft, Ruby, Javascript, Mobile. Once a production is selected, a list of Episodes is displayed with summary information. Joe wants to view Episode 2 of Real World ASP.NET MVC3 – so he selects it. The video starts.
- Episodes. Kelly User watches our stuff on her way to work every day, and when she gets on the train will check and see if we’ve pushed any new episodes recently. A list of 5 episodes comes up – she chooses one, and watches it on her commute.
A lot of these ideas may look familiar if you know where I work. Consider that may be why I work there :)
Authentication
The API will support HTTP basic authentication. User’s will provide their Tekpub credentials to authenticate requests. HTTPS is required to secure transmission of the credentials over the wire. We don’t need any tokens for future requests. Every request is authenticated with the same credentials.
Basic authenticatoin is extremely easy to configure in almost any HTTP library (cough .NET). Allowing apps to store credentials is relatively low risk especially since the initial API is read-only.
Why not OAuth? For a read-only API I think OAuth would be overkill. Once you want to let apps act on behalf of a user to take some action, OAuth starts to make more sense. It’s also a lot of work to implement for very little pay off. From the provider side there are some definite benefits like tracking app access, being able to shut off rogue apps, etc. but these are likely not big concerns early on.
It is useful to provide a method to determine if credentials are valid. Using Basic auth though we don’t need a separate endpoint just for authentication. We can check the credentials by making a request to the API root:
GET /api
Authorization: Basic (base64 encoded credentials here)
If successful the API will return some basic info about the user. We could add more information to the root response if needed. For now I’m focusing just on the authentication part.
HTTP/1.1 200 OK
Content-Type: text/json
{
"user" : {
"name" : "John Sheehan"
"id" : 123
"subscription_expires" : 1356976799
}
}
I’m using Unix timestamps in UTC for the dates. JSON date formatting is a touchy topic I don’t want to get too deep into, but Unix timestamps are easy to consume in most langauges and frameworks (cough .NET).
If the credentials are invalid, the API will return an error. This response will be returned on any resource that is accessed with invalid credentials.
HTTP/1.1 401 Unauthorized
Content-Type: text/json
{
"error" : {
"code" : 401
"message" : "The credentials provided are invalid. Please authenticate using your Tekpub user information."
"more_info_url" : "http://tekpub.com/help/errors/401"
}
}
You may be asking, why not just put the error information at the top level of the JSON returned? If you’re using a dynamic language to parse your JSON it’s a little less typing to have the error data at the top level. If you are using something strongly-typed you’ll want to have responses from a single endpoint that don’t conflict with each other. This is a lesson I’ve learned from RestSharp. If you can’t map all of the responses from a single endpoint to a single class definition (even if you’re only partially populating it), you’re in for a world of hurt trying to wrangle around it.
With a simple request to /api
we can now validate credentials and get other important information about the user to display in our app.
Content
There are a couple different types of content on the site:
- Channels (groups of productions by category of topic)
- Productions (groups of episodes around a single topic)
- Episodes
Productions can belong to many channels. Episodes only belong to one production.
Let’s start with a list of channels and the data in the response. Assume for the rest of the post all requests are authenticated properly.
// Request
GET /api/channels
// Response
Content-Type: text/json
{
"item_count" : 20,
"items" : [
{
"id" : "CHMS",
"name" : "Microsoft",
"productions" : [
{
"id" : "PRCS",
"name" : "Mastering C#",
"author" : "Jon Skeet"
},
{
"id" : "PRRD",
"name" : "RavenDB",
"author" : "Oren Eini"
}
]
},
... truncated ...
]
}
There’s the potential for a lot of data to be sent back here. Channels and productions don’t change that often though so with some standard HTTP caching we can eliminate a lot of overhead involved with making this request.
I’ve named the root element for the data ‘items’ again for strongly-typed langauges. Particularly in C# this in combination with generics makes deserialization cleaner.
Moving on to productions. Let’s get a list of all productions.
// Request
GET /api/productions
// Response
Content-Type: text/json
{
"item_count" : 20,
"items" : [
{
"id" : "PRCS",
"name" : "Mastering C#",
"author" : "Jon Skeet",
"last_updated" : 1356976799,
"episode_count" : 10,
"channels" : [ "Microsoft", "Languages" ]
},
{
"id" : "PRRD",
"name" : "RavenDB",
"author" : "Oren Eini",
"last_updated" : 1356976799,
"episode_count" : 5,
"channels" : [ "Databases" ]
}
]
}
Pretty straightforward. Each item in the list is a superset of the data returned in the Channel list. We’ll discuss how we might filter this list later.
Moving on to the details for an individual production:
// Request
GET /api/productions/PRCS
// Response
Content-Type: text/json
{
"id" : "PRRD",
"name" : "RavenDB",
"author" : "Oren Eini",
"last_updated" : 1356976799,
"episode_count" : 5,
"channels" : [ "Databases" ],
"episodes" : [
{
"id" : "EP100",
"title" : "Why You Should Care About RavenDB",
"duration" : 1200,
"release_date" : 1356976799
},
{
"id" : "EP101",
"title" : "Understanding Document Names",
"duration" : 300,
"release_date" : 1356976799
},
... truncated ...
]
}
You can see that the deeper into detail we go, the more information we provide. Note that episodes are returned in chronological order, the order they’ll most likely be viewed in when looking at the entirity of a production. If the order needs to be reversed, we’ll defer to the application.
Episodes
Episodes present a couple new problems to be solved. We’ve already covered a per-production list in our production detail resources. But we’re looking for a way to list out just the most recent ones. It may be useful to have a list of all episodes for searching as well.
Episode detail needs a home too. We could “bury” it down some sort of logical tree like /api/productions/PR123/episodes/EP100
. It’s easy to argue for that type of structure. Episodes belong to a single production and this likely reflects the heirarchy of the underlying data. It’s important to consider the context though. What makes sense in your data layer does not necessarily make sense at the API layer.
First let’s deal with Rob’s third requirement: a list of recent episodes. I think this one is simple:
// Request
GET /api/episodes
// Response
Content-Type: text/json
{
"item_count" : 20,
"items" : [
{
"id" : "EP110",
"production_id" : "PRRD",
"title" : "RDBMS Replication",
"duration" : 1200,
"release_date" : 1356976799
},
{
"id" : "EP208",
"production_id" : "PRCS",
"title" : "Covariance and Contravariance",
"duration" : 650,
"release_date" : 1356976798
},
... truncated ...
]
}
This list is returned in reverse chronological order; the order most likely to used when displayed. If different sorting is needed, we’ll leave that to the app. The key concept is that each resource should return data in the most sensible order and completeness relevant to the context of the information in the resource.
One URL, many types of data
For the episode detail we’ll use /api/episodes/{episode_id}
with URLs included for all the media (videos, etc) for that episode. I’ll spare you a needless example here.
One great feature of HTTP is the ability to use the Accept
header to specify which kind of content you want. So far my examples haven’t specified one and the API has returned JSON by default. I’m a fan of reasonable defaults, but others prefer things more explicit.
Using the Accept
header, we can request the same URL but get different types of media back. Fancy pants people call this content negotiation. I just like to call it useful. Imagine we want to get back the video for a specific episode. We’ll specify the content type we’re looking for in the request:
GET /api/episodes/EP123
Accept: video/mp4
In my hypothetical world Tekpub is using a CDN to serve content. The actual video file doesn’t exist at /api/episodes/{episode_id}
so we’ll return a temporary redirect and a Location
header pointing to the actual file:
303 See Other
Location: http://cdn.tekpub.com/example.mp4
This can be difficult to test in a browser, so you may want to also allow “file extensions” to specify the content to return.
GET /api/episodes/EP123.mp4
The response would be identical to above. This makes it easier when playing with the API via a browser. I’d give priority to the file extensions (since browsers send Accept
headers as well, so you can’t just check for the presence of it), then the Accept
header value, then the default format if neither of those exist.
List Filtering
There’s one requirement still not fulfilled and that’s the ability to filter the list of productions by channel. I’m a big fan of APIs that let you filter a list on any of the properties of the items in the list. So for filtering productions by channel, we’ll make the following request:
GET /api/productions?channel=CHMS
There are a lot of options here that are left to one’s taste. Do you accept channel names or just IDs or both? How about multiple channels? There’s no right answer. It’s a style choice ultimately. We also already have a list of productions by channel in the channel detail resource so this may be redundant if a single filter criteria is what you need. But most list resources benefit from having filtering by any number of properties. For instance, filtering recent episodes by production and title (assuming defaulting to a partial match):
GET /api/episodes?production=PRCS&title=covariance
Hypermedia
I left it out for this post, but imagine every item in every list included a url
property that linked to its detail page. URLs are nice. If an API provides them, store the URL instead of or along side the ID of an item. That way if the item moves (and the API behaves like it should) you can follow a redirect returned to the new resource location. In fact, your API’s URL structure becomes mostly meaningless if URLs are omnipresent throughout your responses. Roy would be proud. It does get trickier when you add the ability for the API to take actions as well as read data but that’s for another time.
List of Operations
Rob wanted URLs as a starting point. I think that’s the wrong way to go about it. The URL structure will design itself for a well laid out API. Here’s what I ended up with:
- Authenticate user - GET /api
- List channels - GET /api/channels
- List productions - GET /api/productions
- List productions by channel GET /api/productions?channel={channel_id}
- List all episodes - GET /api/episodes
- List recent episodes - GET /api/episodes (default sort is reverse chronological)
- Get episode detail - GET /api/episodes/{episode_id}
- Get production detail - GET /api/productions/{production_id}
- Get channel detail - GET /api/channels/{channel_id}
- List episodes by production - GET /api/productions/{production_id} (data is included in production detail)
- List recent episodes by channel - GET /api/episodes?channel={channel_id}
- List recent episodes by production - GET /api/episodes?production={production_id}
Omissions
I left out sorting because I don’t think it’s needed with sensible defaults. I left out paging because all of the responses could conceivably return every item on every request and still be small (especially if compression is used). Maybe the episodes list could use some paging, but we can go down that road later if needed. I left out versioning too. That’s a long post in and of itself.
Conclusion
You can see that even the simplest APIs have a lot of considerations to be made. Once you’ve settled on a couple patterns that you can use across your entire API implementing it becomes easy. And your API consumers will be able to learn it quickly. Be consistent across your resources (one style for lists, one style for detail, consistent data structures) and pay attention to the anticipated usage context for each request.