Collaborative Filtering at Shelby

[by Myles]

At Shelby, we’ve been thinking a lot about the online video space, the ping-pong and in particular, the discovery. That’s because right now, on Shelby, you never encounter videos that haven’t been shared by people you’re connected to on facebook and twitter. As you might imagine, this is a double-edged sword. On the one hand, you’re mostly encountering brand-new video that you’ve never seen before - great. Furthermore all of the videos in your Shelby stream will have come from someone you know, and in a sense, these videos are more meaningful than videos from strangers. But on the other hand, people you’re connected to on twitter and facebook may indeed - but don’t necessarily - share your taste in video, so we can’t be 100% sure that the content they’re sharing is always content that you’re going to actually enjoy. 

As noted by Fred Wilson, merely transplanting a social graph from one application to another isn’t always going to achieve your goal. He argues that some social graphs should not be created explicitly by the user selecting its members (as on fbook and twitter). Instead, in applications that are focused around a certain type of content (e.g. music, video), users tastes in the content should be the basis for forming social graphs. Although we will support explicit graph creation, we think that Shelby should implicitly create social graphs containing users with similar taste in video - and leverage these to discover new content for our users. With that in mind, we’ve started building and we’re primarily using a great open source tool calledRedis

So what is Redis, and why do we think it’s the perfect tool for discovery? Redis is a blazingly fast in-memory data-structure database. It allows you to house your data in (among others cool things) hashes, lists and sets - which is a far cry from the relational model (eg MySQL) or the document model (eg MongoDB). All operations on these data-structures (like pushing to a list) happen atomically, so programming with redis really feels like programming with the native data-structures of your programming language. We’re really excited about redis because it allows us to store our user-data in exactly the form we want it to compute similarity between our users. Better yet, its command set will cover most of the calculations we need, and we can easily script our own commands for those that it doesn’t. 

The redis data-structure that we’re heavily leveraging is the set. I mean a set in the mathematical sense - a collection containing only unique members. For each of our users, there is a set of videos that they’ve ‘liked’ on Shelby - and this gets added-to whenever a user likes a video. How similar two users are is simply a function of the intersection of their two sets. Redis ships with a command (sinter) that will compute the intersection of sets. Specifically, we’re interested in the size of that intersection - roughly speaking, the larger it is, the more similar the two users are. Redis doesn’t natively compute the size of an intersection (unless you store the intersection first), but as I mentioned, it does let you script your own commands in lua. These lua-scripted commands will execute atomically at near to native C implementation speeds! So we simply scripted our own command that takes the intersection (calling the native ‘sinter’ command in the lua script) and returns its size. We use this to generate a similarity score that determines how many videos each side of the pair will copy to the others ‘recommended’ pool of videos. 

This type of user-to-user intersection computation will happen for every possible pairing of users in our user-base every time we run the process. Maybe this sounds like overkill to you, but we think it’s worth it. It allows us to create those implicit social networks on-the-fly that Fred Wilson is such a big fan of. A popular alternative would be to create a semi-permanent set of ‘buddies’ for a given user, and once that is formed, periodically scour their liked videos for new content. But tastes in videos change from moment to moment in the rapidly evolving meme-space that is online video. We implicitly create your fresh social video graph every time we run this computation. 

This is very much a work in progress - but we’ve got a bit expressed in node.js up on the github. Check out the repo.

12 notes

Show

  1. foreclosure-listings reblogged this from shelbytv
  2. ags reblogged this from shelbytv and added:
    Great post. I’ll
  3. shelbytv posted this

Blog comments powered by Disqus