blackrimglasses.com

Icon

Music + Technology + Random Nonsense from the Music Industry by Ethan Kaplan, VP Product, Live Nation

XMPP, Spread, Daemons, Python… aka a fun day being a geek.

I spent yesterday diving into some hacking because I never get a chance to during the week. A caveat though, I’m not a great programmer, just a clever one. My code is messy, non-refactored and more akin to some finger painting than careful architectural execution.

However, a “painting” approach to programming has yielded me some interesting things over the years. One of which was pStruct, which is/was a massive multi-threaded agent-system that I kind of organically grew from the World object up in Java.

I’ve since switched my “painting” language to Python, as its nicely object oriented when necessary, has a steady stream of libraries and API’s available, and is nice and elegant to play with.

Right now my coding is centered around the concept of Decoupling. IE: removing dependencies and synchronicity between web-front ends and the backend systems, as well as better methods for realtime process communication.

These efforts are motivated by the fact that our sites are basically Lego construction kits that use a lot of external services to do various things. For instance, CRM, mailing list management, SMS management, geo-coding, analytics and more. Each of these is dependent on an out-of-band system, but essential for front-end operations.

Time to decouple.

The premise for this is to keep the front-end process smooth from end to end, while providing the ability of asynchronous data-processing on the back-end. Also complicating this is that our sites are atomic databases, so the only way to do any consolidation is through backend processing.

Spread

The concept of using a Message Queue is not new in this Web++ world, as Twitter, etc use it, as does Digg and Flickr. Our need for a Queue is not related strictly to asynchronous and long-running processes, but also because of the variability of availability of some of the systems that we require to implement.

We also have a case where the queue will be handling a significant amount of different message types, from “user logged in” to “user changed profile.” Each message would have a number of things that could happen to it. For example, a “profile change” would need geo-coding, CRM sync, mailing list sync and more.

I like Spread as a method for using the queue because its robust, well maintained and easy to implement. While Starling and Sparrow are pretty cool, and there are other projects like Apache MQ that are more suited, Spread is nice, realtime and easy to work with.

The Daemon

Our sites (drupal of course) will drop messages into Spread at various hooks. They will all go into a common queue.

Sitting at the App server level will be a Python Daemon that monitors the Spread queue. Upon finding a message in Spread, the Python Daemon spawns an “Analyzer” thread which looks at the Message to determine would should be done. For a “user registered,” it’ll execute a geocode, crm sync, mailing list sync and maybe some other things. Upon completion, it’ll package the results of all this up into a new message and stick it into a Responder thread.

The responder takes the “callback” method and URL from the original message an issues an XML-RPC callback to the Drupal site. This would fill in the geocode information to Drupal and such.

By running Analyzers and Responders as independent worker thread pools, we remove any blocking caused by either external services (ie, Google API for geocoding) or Drupal (the XML-RPC callbacks and database loads).

In my tests on my machine here using GeoPy, this all worked really well.

The API

In order to avoid our Daemon having any dependencies on other people’s API’s, we’re probably going to use Drupal to create an API abstraction on top of all the external services. This will be done using Drupal modules for each service, and the Services module to provide the API.

This way we can swap out the Mailing List manager and not have to alter the API’s from the Daemon or indeed any other external service (widgets, etc) using it. The ideal would be for this API abstraction to apply to everything from payment processing, merchandise fulfillment, SMS sending and more.

XMPP

Yesterday I also ended up coding a Twitterbot that used the XMPP protocol for Twitter (twitter@twitter.com in Jabber). This was interesting as it provoked some thought into the power of XMPP as a realtime information distribution method for system info. The Twitterbot is fun, and we’ll probably make it responsive and semi-intelligent, but what was more fun was seeing the power of the XMPP protocol (which I just barely tapped into). Vs. an HTTP based API, XMPP has lower latency, lower overhead and no polling.

One thing I do want to do is create a XMPP PubSub publisher using the xmpppy package. I haven’t yet started that. I could see using this as a method of getting realtime information from our systems, such as a the feed of users signing up for sites.

I also could see XMPP’s usage in terms of public artwork or installation artwork as an easy socket based system for distribution of information onto visualization displays. When I was in school I had to write my own socket servers and protocols for this.

For the Bot testing I installed OpenFire XMPP server, which works very well. I’m impressed.

Datacenter Simulator

I also should mention that Wiredtree has provided us for a low-cost a parallels Virtuozzo box to act as a “Datacenter Simulator” I have it setup with two WWW servers, one DB server, one DEV server and one APP server. Its running each as a virtual machine on one box within the same subnet.

The goal of this is to simulate in a small scale the topology of a data-center for the creation of systems and tools for our web infrastructure. This includes Spread, the API, Daemons and SVN/Trac and other tools.

—–

I’ll post source code and diagrams when the code isn’t embarrassing and the diagrams are more complete.

0saves
If you enjoyed this post, please consider leaving a comment or subscribing to the RSS feed to have future articles delivered to your feed reader.

Comments are closed.