Russ Nelson's blog : /opensource

RSS is a great technology, with a flaw. Currently, the way RSS works is that you have a URL pointing to an RSS feed. This feed dynamically changes to contain new entries as they're posted and removes old ones off the end as they age out. But the "feed" is a complete XML document. So in order to get updates, you need to fetch the entire document and examine it for entries newer than the last time you fetched the document.

In other words, polling.

The problem with polling is that it consumes resources. Much better to set up a communication point where the receipient (RSS client) waits for the sender to post new items. In other words, streaming RSS, or SRSS. Streaming XML is not a new concept. Jabber has been doing streaming XML for years now. The problem with following that concept is that Jabber is one-to-one communications. RSS is one-to-many. For small values of "many" a standard HTTP server will work fine.

But think of Britney Spears' RSSer feed. She'll have millions of followers, all of whom want to hold a TCP session open. This simply doesn't scale using a general-purpose TCP/IP stack.

So, imagine a TCP/IP stack designed for streaming RSS. It would be able to hold open literally millions of TCP sessions at the same time. Since it's sending out the same content to many different recipients, each session just needs a pointer into the content that has been sent so far, plus the remote IP address and TCP port, and maybe a retransmission timer or two.

When she posts something to her feed, it will be sent out using just two packets: one with the data, and another one ACKing the data. And yes, some of the TCP sessions will be dangling and will send a TCP RST. But the rest will receive their feed in real time, or as near as you can get there with 5 million TCP sessions to feed.

Now, all RSS clients will need to be modified to use SRSS. But the key here is that even if they don't, they'll still be able to fall back to RSS. As long as the server can understand an appended ?streaming=yes on its feed URL, the clients can be modified at whatever rate the author desires.

Been thinking about this for years, but I was prompted to write this up by a posting by Dave Winer suggesting that we could distribute the functionality of Twitter using an RSS client and people's individual RSS feeds. This is a great idea at that level of the IP stack, but when you go down one level to try to implement it, fetching a full RSS file every time you check for news is incredibly inefficient and slow. Much better to use SRSS so that when somebody posts to their RSSer feed, it appears immediately.

posted at: 20:54 | path: /opensource | permanent link to this entry

Wed, 01 Apr 2009

Archives

Are you not a coder? Or are your coding skills rusty, having moved on? No matter! You can still contribute to open source. Open source is only one part of a program. The other part is open data. I'm encouraging people to contribute to OpenStreetMap. We're running OpenStreetMap mapping parties all over the world. All skills taught! What's important is your willingness to contribute to an Open Data project, and location, location, location. We can only map where you are.

posted at: 15:30 | path: /opensource | permanent link to this entry

Wed, 21 Jan 2009

Cloudmade, my new employer

After 17 years of working for myself, I've decided to fire my boss, and hire a new one, Cloudmade. We're working on improving OpenStreetMap, a community edited map. All sorts of geodata can, should, and need to be added to OpenStreetMap. I'm available to give presentations about open data, OpenStreetMap, and collaborative communities in the NorthEast of the USA.

I'm also blogging over at the community Cloudmade site.

posted at: 08:25 | path: /opensource | permanent link to this entry

Sat, 01 Nov 2008

findwhistle

I've experimented with keeping an audio recording in addition to a GPS track of my bicycle rides. The trouble with a continuous audio recording is that 1) it's long, 2) it's boring, and 3) the interesting things are hard to seek to. If you could do reliable speech recognition, you could say a word like "mark" or somesuch. However, in my experience, street noise is going to kill you.

Better than that, you detect a whistle. The code below will print the duration of the whistle, the time from the beginning of the audio recording, and the pitch of the whistle. The purpose of this is to be able to do continuous audio recording, and yet be able to take a waypoint with an audio annotation.


#!/usr/bin/python

import sys
import wave
import struct

def findwhistle(inwave):
    """given an open wave file, return an array which consists of the times
    whenever a whistle was found."""
    framecount = 0
    zerocross = 0
    lastzerocross = 0
    zerocrosssum = 0
    zerocrosscount = 0
    sign = 1
    while True:
        frames = inwave.readframes(100)
        if len(frames) == 0: break
        frames = struct.unpack("<100h", frames)
        for i, sample in enumerate(frames):
            if sign * sample > 0:
                zerocross += 1
            else:
                if abs(zerocross - lastzerocross) <= 1:
                    zerocrosssum += zerocross
                    zerocrosscount += 1
                else:
                    if zerocrosscount > 100:
                         print '! %4.2f %4.2f %5.0f' % ( zerocrosssum / 8000.0, (framecount + i - zerocrosssum) / 8000.0,  zerocrosscount / (zerocrosssum / 8000.0))
                    zerocrosssum = 0
                    zerocrosscount = 0
                #print zerocross
                sign = -sign
                lastzerocross = zerocross
                zerocross = 1
        framecount += len(frames)
    return framecount / 8000.0

def main():
    f = wave.open(sys.argv[1], "r")
    print f.getparams()
    print findwhistle(f)

if __name__ == "__main__":
    main()

posted at: 05:46 | path: /opensource | permanent link to this entry

Russ Nelson's blog

Fri, 28 Jan 2011

Fixing RSS