Fri, 28 Jan 2011

Fixing RSS

RSS is a great technology, with a flaw. Currently, the way RSS works is that you have a URL pointing to an RSS feed. This feed dynamically changes to contain new entries as they're posted and removes old ones off the end as they age out. But the "feed" is a complete XML document. So in order to get updates, you need to fetch the entire document and examine it for entries newer than the last time you fetched the document.

In other words, polling.

The problem with polling is that it consumes resources. Much better to set up a communication point where the receipient (RSS client) waits for the sender to post new items. In other words, streaming RSS, or SRSS. Streaming XML is not a new concept. Jabber has been doing streaming XML for years now. The problem with following that concept is that Jabber is one-to-one communications. RSS is one-to-many. For small values of "many" a standard HTTP server will work fine.

But think of Britney Spears' RSSer feed. She'll have millions of followers, all of whom want to hold a TCP session open. This simply doesn't scale using a general-purpose TCP/IP stack.

So, imagine a TCP/IP stack designed for streaming RSS. It would be able to hold open literally millions of TCP sessions at the same time. Since it's sending out the same content to many different recipients, each session just needs a pointer into the content that has been sent so far, plus the remote IP address and TCP port, and maybe a retransmission timer or two.

When she posts something to her feed, it will be sent out using just two packets: one with the data, and another one ACKing the data. And yes, some of the TCP sessions will be dangling and will send a TCP RST. But the rest will receive their feed in real time, or as near as you can get there with 5 million TCP sessions to feed.

Now, all RSS clients will need to be modified to use SRSS. But the key here is that even if they don't, they'll still be able to fall back to RSS. As long as the server can understand an appended ?streaming=yes on its feed URL, the clients can be modified at whatever rate the author desires.

Been thinking about this for years, but I was prompted to write this up by a posting by Dave Winer suggesting that we could distribute the functionality of Twitter using an RSS client and people's individual RSS feeds. This is a great idea at that level of the IP stack, but when you go down one level to try to implement it, fetching a full RSS file every time you check for news is incredibly inefficient and slow. Much better to use SRSS so that when somebody posts to their RSSer feed, it appears immediately.

Posted [15:54] [Filed in: opensource] [permalink] [Google for the title] [Tags , ] [digg this]

Wed, 01 Apr 2009

Open Data

Are you not a coder? Or are your coding skills rusty, having moved on? No matter! You can still contribute to open source. Open source is only one part of a program. The other part is open data. I'm encouraging people to contribute to OpenStreetMap. We're running OpenStreetMap mapping parties all over the world. All skills taught! What's important is your willingness to contribute to an Open Data project, and location, location, location. We can only map where you are.

Posted [11:30] [Filed in: opensource] [permalink] [Google for the title] [Tags , , ] [digg this]

Wed, 21 Jan 2009

Cloudmade, my new employer

After 17 years of working for myself, I've decided to fire my boss, and hire a new one, Cloudmade. We're working on improving OpenStreetMap, a community edited map. All sorts of geodata can, should, and need to be added to OpenStreetMap. I'm available to give presentations about open data, OpenStreetMap, and collaborative communities in the NorthEast of the USA.

I'm also blogging over at the community Cloudmade site.

Posted [03:25] [Filed in: opensource] [permalink] [Google for the title] [Tags , , ] [digg this]

Mon, 03 Nov 2008


I've experimented with keeping an audio recording in addition to a GPS track of my bicycle rides. The trouble with a continuous audio recording is that 1) it's long, 2) it's boring, and 3) the interesting things are hard to seek to. If you could do reliable speech recognition, you could say a word like "mark" or somesuch. However, in my experience, street noise is going to kill you.

Better than that, you detect a whistle. The code below will print the duration of the whistle, the time from the beginning of the audio recording, and the pitch of the whistle. The purpose of this is to be able to do continuous audio recording, and yet be able to take a waypoint with an audio annotation.


import sys
import wave
import struct

def findwhistle(inwave):
    """given an open wave file, return an array which consists of the times
    whenever a whistle was found."""
    framecount = 0
    zerocross = 0
    lastzerocross = 0
    zerocrosssum = 0
    zerocrosscount = 0
    sign = 1
    while True:
        frames = inwave.readframes(100)
        if len(frames) == 0: break
        frames = struct.unpack("<100h", frames)
        for i, sample in enumerate(frames):
            if sign * sample > 0:
                zerocross += 1
                if abs(zerocross - lastzerocross) <= 1:
                    zerocrosssum += zerocross
                    zerocrosscount += 1
                    if zerocrosscount > 100:
                         print '! %4.2f %4.2f %5.0f' % ( zerocrosssum / 8000.0, (framecount + i - zerocrosssum) / 8000.0,  zerocrosscount / (zerocrosssum / 8000.0))
                    zerocrosssum = 0
                    zerocrosscount = 0
                #print zerocross
                sign = -sign
                lastzerocross = zerocross
                zerocross = 1
        framecount += len(frames)
    return framecount / 8000.0

def main():
    f =[1], "r")
    print f.getparams()
    print findwhistle(f)

if __name__ == "__main__":

Posted [01:46] [Filed in: opensource] [permalink] [Google for the title] [Tags , , , ] [digg this]

Thu, 30 Oct 2008

SciPhone && Open Source

These guys (SciPhone) really REALLY ought to get together with some open source developers. Looks like a great product, but it's almost 100% certain that their software stinks.

Posted [00:38] [Filed in: opensource] [permalink] [Google for the title] [Tags , , ] [digg this]

Mon, 07 Jul 2008


In a free market, over time, competition in the production of a commodity product will eliminate all profits. Bread-makers can sell their bread for enough money to cover the cost of the capital invested in the bakery, the cost of the flour, yeast, sugar, and water, the fuel needed for firing, and the salary of the baker. They can earn no more money than that. If they did, then another bakery would be established which would price its products lower, splitting that profit between the customer and the owner of the new bakery.

In order to earn a profit, you need to do something special (called a franchise). This could have several forms: you could create something new that nobody else has. You could have an exclusive territory assigned to you (as in the traditional franchise, such as McDonald's etc). You could have help from the government, in the form of a patent or copyright. Or you could have a professional certification, such as a law or medical degree, without which one is prohibited from practice -- and possession of which is controlled by other lawyers and doctors who are sure not to give out too many.

In the case of software development, you can copyright and/or patent your software (although it's dodgy that both apply, since the theory is that they can't both be used on the same work). Or, you can write your software in such a way that it is inextricably tied to a piece of hardware which only you sell. Or you can develop an expertise with a piece of software which nobody else can or will reproduce.

Or you can simply not worry about getting a franchise because you know that only certain types of people have the ability to program. If true (and I believe it to be true) then programmers will forever command higher than usual salaries. And the more demand for programmers, the better-off will be programmers. And the more use of software, the more demand for programmers. And the less expensive is software, the more wide will be the use of it.

Every process is a mix of inputs. The ratio of inputs depends on the cost of these inputs. The process gets changed over time to handle the varying cost of the inputs. If one of them becomes cheaper, it becomes a larger factor in the production.

I believe that there is sufficient evidence to say that Open Source and free software lowers the cost of production of software, and hence will ineluctibly raise the salary of programmers, even as these programmers give away more and more of their software.

All of this, of course, is in complete opposition to Stallman's GNU Manifesto. He attempts to rebut objections to GNU's goals. He repeatedly makes the claim that free software will reduce programmer's pay. I claim otherwise. Hopefully Stallman has changed his mind.

Posted [14:42] [Filed in: opensource] [permalink] [Google for the title] [Tags , , ] [digg this]

Sun, 04 May 2008

Web 2.0 doesn't imply usability

I recently got myself a Flickr Pro account, and have been using Flickr for more of my photos. I find myself more and more annoyed at the rough edges in the Flickr user interface. For example, when you want to delete a tag from something, you click on the [x] to the right of the tag. Flickr asks you "Do you want to delete the tag?" Cancel/Ok:

This is almost certainly the wrong thing to do. It annoys people because the website is (in effect) saying "Hey, that might be a stupid thing to do, so I'm going to slow you down so you can think about it." The first couple of times people might pause to think (but what they're likely thinking is "you stupid computer, I told you what to do".) After that, when they want to delete a tag, the action will be "Click X; Click Ok", with no pause for thought.

That is how people think. That is how people are able to learn a complicated game like chess, or go. People chunk information and actions together. This allows the forebrain to go on thinking about other things while the rest of the brain carries out an action previously decided-upon. If an action requires a confirmation, the hindbrain will confirm it as part of executing the action chunk.

The way to work with human congnition rather than against it is to allow for Undo. Undo isn't a new idea -- we were using it 25 years ago. Undo works well with the human brain because it allows actions to happen without confirmations, but it also allows the forebrain (which operates slower than the hindbrain) to realize that it has made a mistake, and correct it with an Undo.

Flickr isn't all bad. They do use Undo sometimes:


When they add an image to a set, they add an indication that it's in the set over on the right, so the "OK" part is useless. They should skip the dialog entirely and insert a temporary "UNDO" below the set listing. Even when they do use UNDO, they spoil its operation with a confirmation:
Of course I want to remove it from the set! That's why I just clicked on UNDO, right?

Following the confirmation is another useless "Click OK to indicate that you are still alive" box.
Of course it's been removed, because the set listing is now gone. The proper way to handle this is to grey out the set listing on the right, and add an "UNDO" button below it.

Even if you've implemented your website using Open Source software like Linux, Apache, MySQL, and PHP, you don't escape the low quality typical of proprietary software unless your software is Open Source.

It's easy to volunteer other people to fix problems. In the Open Source world, the typical response is "great idea; send a patch." Flickr lives in the Web 2.0 world, not the Open Source world. Their software sucks just like any proprietary program. We can't fix it. Only Flickr can fix it, and hopefully, they'll at least fix the problems I've outlined here.

Posted [00:56] [Filed in: opensource] [permalink] [Google for the title] [digg this]

Tue, 29 Apr 2008

config.h Considered Harmful

Many, many programs written in C or C++ use a file called "config.h" which contains #define statement that control the compilation of the program. These programs are also nearly always build using 'make'.

I claim that these two attributes are in conflict with each other. Or, in layman's terms, "config.h sucks". The problem is that when you have multiple options in config.h, every file which may be compiled differently depending on the values defined therein, must be recompiled whenever config.h changes.

The correct way to do compile-time options is to have a config subdirectory containing a multitude of .h files, each with its own #define in it. These are easily managed because each file has only one #define, and when the source file mentions the thing being defined, it needs a #include of that config file. The 'make' program is trivially informed of these dependencies by looking at the files included in each source.

So, when you change one option, only those files which depend on it will get recompiled.

I wrote this blog posting while waiting for a program to recompile because I changed config.h .... and it's still not finished recompiling on pretty studly machine. Ahhhh, it just finished.

Of course, this is completely disrupted when you rewrite your Makefile (as GNU automake does), but that's a subject for a different posting.

Posted [17:07] [Filed in: opensource] [permalink] [Google for the title] [Tags , , , , , , ] [digg this]

Mon, 24 Mar 2008

Patents and Open Source

Are you a patent holder, wondering how to write software which implements your patent? Here's my advice: Patents expire. Towards the end of the patent's lifetime, you want to be trying to transfer the patent's franchise over to the relationship between the patent-holder and the licensee. That can be done with closed-source software, but you risk competitors writing their own software. With Open Source software, as long as you manage the relationship with the user correctly, you end up with a franchise.

Long before a patent expires, you have ZERO NEED for closed-source software. ZERO. NONE. The purpose of a patent is to give you ownership over the idea. The purpose of closed-source software is to give you ownership over the code. But if you already have a patent, you own the idea. No need to own the code -- in fact, owning the code only hurts you, because it closes you out to people who would improve the code, or even to people who would create new patented works based on your patent.

If you have a patent, you NEED open source software.

Posted [10:57] [Filed in: opensource] [permalink] [Google for the title] [Tags , ] [digg this]

Tue, 04 Mar 2008

Licensing Adobe Flash

Are you having trouble licensing the Adobe Flash player? Apparently, Adobe makes it difficult for some people to license their Flash player. I don't know why -- you'd think they'd be all about the money. But regardless, the Gnash project is happy to license its Flash player to you if Adobe won't license their Flash player. Go Get Gnash now.

Posted [16:43] [Filed in: opensource] [permalink] [Google for the title] [Tags , , , ] [digg this]