Fri, 28 Jan 2011

Fixing RSS

RSS is a great technology, with a flaw. Currently, the way RSS works is that you have a URL pointing to an RSS feed. This feed dynamically changes to contain new entries as they're posted and removes old ones off the end as they age out. But the "feed" is a complete XML document. So in order to get updates, you need to fetch the entire document and examine it for entries newer than the last time you fetched the document.

In other words, polling.

The problem with polling is that it consumes resources. Much better to set up a communication point where the receipient (RSS client) waits for the sender to post new items. In other words, streaming RSS, or SRSS. Streaming XML is not a new concept. Jabber has been doing streaming XML for years now. The problem with following that concept is that Jabber is one-to-one communications. RSS is one-to-many. For small values of "many" a standard HTTP server will work fine.

But think of Britney Spears' RSSer feed. She'll have millions of followers, all of whom want to hold a TCP session open. This simply doesn't scale using a general-purpose TCP/IP stack.

So, imagine a TCP/IP stack designed for streaming RSS. It would be able to hold open literally millions of TCP sessions at the same time. Since it's sending out the same content to many different recipients, each session just needs a pointer into the content that has been sent so far, plus the remote IP address and TCP port, and maybe a retransmission timer or two.

When she posts something to her feed, it will be sent out using just two packets: one with the data, and another one ACKing the data. And yes, some of the TCP sessions will be dangling and will send a TCP RST. But the rest will receive their feed in real time, or as near as you can get there with 5 million TCP sessions to feed.

Now, all RSS clients will need to be modified to use SRSS. But the key here is that even if they don't, they'll still be able to fall back to RSS. As long as the server can understand an appended ?streaming=yes on its feed URL, the clients can be modified at whatever rate the author desires.

Been thinking about this for years, but I was prompted to write this up by a posting by Dave Winer suggesting that we could distribute the functionality of Twitter using an RSS client and people's individual RSS feeds. This is a great idea at that level of the IP stack, but when you go down one level to try to implement it, fetching a full RSS file every time you check for news is incredibly inefficient and slow. Much better to use SRSS so that when somebody posts to their RSSer feed, it appears immediately.

Posted [15:54] [Filed in: opensource] [permalink] [Google for the title] [Tags , ] [digg this]