Building a Twitter client

Jan 8, 2009

I’m a keen twitterer. When I read my tweets I see find that certain voices shout louder than others, where volume = tweet frequency. Those voices aren’t necessarily the ones I care about. I want to know what’s going on with my more restrained friends too.

I designed Followize to solve this problem. Like Twitter100, it shows the latest tweet from each friend. The UI is more efficient than Twitter100’s and I have some enhancements planned that I hope will make Followize a very quick and convenient way to keep up with the people you’re following.

Followize uses the Twitter API’s friends method. Until yesterday, the documentation for that method said it would return “up to 100 of the authenticating user’s friends who have most recently updated.” I.e. that the sort order is the created_at time of each friend’s latest status update. Subsequent pages of less-recently-updating friends can be requested as well. Followize is just a nice UI for this data built on Google App Engine.

However, after building the app and using it for a little while, I noticed that the data was not sorted in this way at all. I raised this as an API issue. One of Twitter’s engineers responded that this was a documentation error, rather than a software error, and updated the docs. The correct order is (effectively) the date the user began following a given person. Unfortunately this all but kills my application.

If Twitter is sending the data in the wrong order for my app, I have to load all the data and sort it myself. The first person I followed might be the one who has most recently updated and thus the last record in the results of the friends method call. Pulling a page of 100 friends from Twitter to App Engine takes around 0.8 seconds, decoding the JSON then takes another 0.15 seconds. Good old Scobleizer follows 21K people, Obama follows 171K! Loading all the required data for Scoble would take 3.3 minutes, plus some time for sorting, committing to cache etc. Twitter rate limits API requests to 100 per 60 minute period. Loading those 21K friends requires 210 API requests, and that’s only for one page. Scoble is likely to reload the page a few minutes later and the whole thing begins again.

I’m looking at using Gnip as a workaround, but this is sub-optimal. A rough strategy would be as follows:

A user logs in to Followize for the first time.
A background process loads the complete list of their friends from Twitter’s API.
Followize adds those friends to a Gnip filter of Twitter users followed by Followize users.
Gnip POSTs updates for each user to a Followize API endpoint.
Followize stores users being followed and their latest update in it’s DB.
When the user requests the page, tweets are loaded from the DB.

This drawbacks to this approach are:

Step 2 could still fall fowl of Twitter’s API rate limit, necessitating a 1 hour wait.
The application load doesn’t scale with traffic. Scoble could sign up, I’ll start getting a tonne of tweets coming in from Gnip, but Scoble may never visit Followize again, rendering that traffic useless. I can pull data up to 60 minutes old from Gnip, so I could minimize the processing overhead by pulling tweets every 60 seconds for example.
All of these API calls would be too long-running for Google App Engine.
The application complexity is dramatically increased and it is now reliant on an additional remote service.

I’d like Twitter to order the data for me, but Twitter’s API as it stands can’t be modified to do all the heavy lifting for every application. Gnip has an interesting model in that they allow you to offload some work, filtering of data, to them. A model in which I could write my own view of Twitter’s data and upload that to be run locally to their DB would be a great solution. Given the wide range of apps using Twitter’s API, I’m hopeful.

Update, Jan 9: HubSpot’s State of the Twittersphere says that only 12% of Twitter users are following more than 100 people. I’d suggest those are not likely the people who will find Followize useful though. In addition, most Twitter users are new to the service and following lists grow with time.

Update, Jan 17: After initial setbacks, I found simply pulling several pages from the Twitter API and caching them with staggered timeouts provides a good enough user experience. Followize lives!