Building a Node.js Events App Using RabbitMQ, Websockets, and Django
I've been working for the past few months learning serverside Javascript development using Node.js to build a new Events platform for Focus.com. We had been running our Roundtable events over phone systems, where listeners would dial in at a specified time to listen to the panels speak, with very little interactivity on the part of the listeners to the panelists. We were looking for a way to improve the accessibility and interactivity of the event experience. Our goals for the project were to:
- Provide a single stream for consuming all event related content (tweets, chat, and Q&A being the 3 primary content types)
- Provide some democratization of the content of the events, allowing users to vote on the issues they wanted to see the panelists address, during the event.
- Allow users to provide feedback on the current topic of discussion, as well as gather that data for later use in highlighting the best parts of the event as decided by attendees for those who later want to listen to a replay of the talk.
With these objectives in mind, I began digging around in search of the best technologies to leverage to accomplish these goals. To give you some history on our webstack, we've been rocking Django since 0.96, which has met all of our previous needs and then some. However, Django is not really suitable for the sort of web app experience we had in mind, where chat and other actions need to be broadcast to anywhere between tens to hundreds of users (while being scalable to thousands if need be) in as close to real time as possible. In our search for figuring out the best tools to use for this, we had several requirements:
- Fast- This needed to be able to handle communication with hundreds of simultaneous users without noticeable delays in sending and receiving data in real-time.
- Scalable- We needed to be able to offer HA (High Availability) scaling for this service, ensuring that we did not introduce any single point of failure that could disrupt the experience.
- Capable of sending and receiving data in real time - Needed to support a large number of concurrent connections at relatively low latency.
- Easily integratable into our current Django site- Using the Django ORM (talking to MySQL) for managing all of the persistence of data was our best option for using that content elsewhere on our site, and preventing any additional points of failure from being introduced.
- No new languages- We didn't want to have to do any significant development with any language our developers were not already familiar with. This basically limited us to Python and Javascript.
In the end, we resolved to add two new pieces of tech to our stack: RabbitMQ and Node.js.
RabbitMQ is an Erlang implementation of AMQP (Advanced Message Queuing Protocol). "AMQP is middleware to provide a point of rendezvous between back-end systems (data stores and services) and front end systems such as end user applications."[1] Node.js is pretty much the hotness right now, recently taking the crown from Rails for being the cool new tech on the block (if Github traffic is any indication). Node.js offered us the opportunity to use Javascript both serverside and clientside for the events application.
So, technologies chosen, I set out learning how to develop using Node.js's event-driven model. After a couple prototype iterations we found an architecture that would accomplish our goals. Here's an early diagram of it:
(Disclaimer: I don't know UML. I am terrible at flowcharts. Sorry)
Since that early diagram, the only real change was that the majority of content-related data is now sent by the Nodejs layer to the client after the client handshakes with Node. The content that is static per event is cached at the page level in the Django layer.
Architecture within Django
We have 8 models defined in the Django ORM that contain data needed by the live event app. These include all of the primary forms of content in the first bullet point in this post, as well as all secondary forms of content that relate to those primary ones (ie: things like votes, answers to questions, etc.). I added to each of these models a 'to_json_serializable' method, which returns a Python dictionary object representing all relevant data for an object of that model that can be JSON serialized.
I bound the following post_save signal handler to all 8 models:
(EDIT: Link for if the embed doesn't appear)
I leverage memcached here to determine if the output from to_json_serializable has changed since last save to reduce unnecessary data transfer (ie: if some object attributes change, but the relevant attributes do not).
The amqp_transmit function sends the JSON package through to RabbitMQ to a fanout exchange that all of the individual Node.js instances have unique queues bound to. Each instance bound to the fanout gets a copy of all data pushed from Django to the fanout.
The last architecture change within Django was building a full RESTful API for accessing and modifying all of these models. The GET requests to the API leverage the to_json_serializable methods, while the POST requests are able to use a user's existing Django user account and authentication state for managing access control.
Architecture within Node.js
Working with Node.js was a ton of fun, and introduced me to an event-driven programming style. We used socket.io for managing connections from the browser to Node.js, a web transport library with both client and serverside Javascript, designed specifically for use in real time web applications, It manages the transfer of data between those entities.
When a Node instance is instantiated, the first thing it does is bind a queue to the RabbitMQ fanout exchange that Django is pushing data to. Once that connection is established, it begins listening for socket.io connections from clients, and requests a list of events from Django. At this point, any event-specific data requests from clients or data-updates from Django's post_save signals are deferred. Once Django returns to Node a list of events, separate objects are created for each event, and all of the currently deferred functions are moved from the global events queue to the relevant queue inside of the event object representing the event which they are attending. Upon instantiation of a new event object, Node makes a request to Django for all of the relevant data. Once it gets and processes all of that data, the deferred functions for that event are evaluated in FIFO order.
Depending on the browser, you'll get a users current sessionid cookie (originally set by Django) in the initial socket.io-facilitated connection from the client to Node.js. If this value is not provided, Node.js makes an additional request to the client for this value. That value is then verified with Django to associate identity with each client. We ended up needing to change our SESSION_COOKIE_DOMAIN value in Django's settings.py from www.focus.com to .focus.com before launching the platform, in order to launch our node boxes behind node.focus.com and allow sessionid cookies to be shared between the two subdomains.
Once a user is connected and Node has finished gathering data about the event the user is attending, Node has a fairly simple job to do. It only has to listen for updates about data changing via RabbitMQ, update its internal representation of the event, and ship the diff to all connected clients attending the event tied to that content.
Overall, I'm extremely happy with how the finished product has come together, and believe that we made the right architecture decisions here. Please leave any questions or comments, I'd be happy to go into further detail about any aspect that I may have glossed over.
[1]: http://en.wikipedia.org/wiki/Advanced_Message_Queuing_Protocol#Overview
PS: I'll be speaking at one of our events tomorrow Wednesday 9/21/11 at 2:00 PT about the new platform, feel free to drop in and check it out. Behind Focus.com: Unveiling our new Live Event App
