Sporkmonger

purveyor of fabulously ambiguous eating utensils

UUID As A Primary Key

Posted by sporkmonger
Written October 28th, 2005

Paul Dix wrote up a quick how-to on using UUIDTools and ActiveRecord for generating UUIDs for use as a primary key on your tables. Based on a couple of comments over on Robi Sen’s blog. Thanks for doing this Paul—judging by the keywords people that find their way to this blog are using for their google searches, there’s a surprising number of people trying to do this.

More FeedTools Tastiness

Posted by sporkmonger
Written September 27th, 2005

The FeedTools schema has changed.

You’ll want to take a look at the schema.*.sql files in the /db folder.

I finally got around to renaming the xml_data field to feed_data and adding the feed_data_type field. It wouldn’t make any sense to be putting yaml (!okay/news) into a field named xml_data now would it?

I fixed a couple bugs caused by the redirect improvements as well that are pretty much guaranteed to rear their ugly head (and yikes, probably mess up your feeds table as well). Not sure how they slipped through all those unit tests, but they did. So now there’s a couple new tests in place to make sure that doesn’t happen again.

So try to avoid 0.2.11 if you can help it. 0.2.12 is much more cuddly.

FeedTools Schema And Other Short Stories

Posted by sporkmonger
Written September 27th, 2005

Now that FeedTools no longer automatically creates the database schema for you, I thought it might be best to put the schema files into rdoc. Of course, rdoc runs those schema files through its text formatter, and the schema files come out more typographically correct on the other side. Except that that’s not really what we want. After a little bit of experimentation, I discovered that if I prefixed the SQL with a SQL comment, and then 2-space indented the SQL that followed the comment, that it would get parsed by rdoc in such a way that you could still copy-paste from the docs straight to whatever SQL frontend you happen to be using.

E.g.:
1
2
3
4
5
6
7
8
9
10
11

-- Example PostgreSQL schema
  CREATE TABLE feeds (
    id                SERIAL PRIMARY KEY NOT NULL,
    url               varchar(255) default NULL,
    title             varchar(255) default NULL,
    link              varchar(255) default NULL,
    xml_data          text default NULL,
    http_headers      text default NULL,
    last_retrieved    timestamp default NULL
  );

By the way, does anyone know of a good SQL frontend for PostgreSQL for OS X? pgAdmin3 crashes on me every 5 seconds or so, and that’s more than a little irritating. At this point, I don’t even care if it’s free/open-source (though that’s a huge bonus). I just want something that works well and doesn’t look hideous.

FeedTools also got a significant speed-up for instances in which http redirection occurs, and the url doesn’t get updated (usually because it’s a permanent redirection instead of a temporary one). In other words, the cache gets updated with the new url, but the open method continues to get called with the old url. FeedTools used to be unaware of the updated feed in the cache and would go out and pull the feed again. This has been changed so that now FeedTools will check the cache before following a redirection to see if the feed is in the cache already and to see whether it’s expired or not. While this definately does increase the number of cache misses during redirection, misses are pretty painless, and the potential speed-up for a hit far outweighs the potential slow-down from the extra misses. I’ll take one or two extra SQL queries over an unnecessary HTTP request any day of the year.

HTTP error messages should now include a list of locations that FeedTools was redirected through before hitting the error. This was inserted primarily for the purposes of debugging.

I removed the global FeedTools.cache_only option in favor of a more granular approach. You can now say:

1
2
3
4

feed = FeedTools::Feed.open(
  'http://rss.slashdot.org/Slashdot/slashdot',
  :cache_only => true)

You may notice that I removed the attribute dictionary functionality. If you were using it, sorry about that, but I decided it was too ugly and hackish, not to mention slow. It had to go.

I split the feed_tools.rb file into a couple pieces as well. No more 5000 line files that are a huge pain to navigate.

FeedTools should now also automatically detect User-Agent blocking and deliver a warn if it runs into that.

Update:

The :cache_only configuration option has been renamed to :disable_update_from_remote. So the code should now be:

1
2
3
4

feed = FeedTools::Feed.open(
  'http://rss.slashdot.org/Slashdot/slashdot',
  :disable_update_from_remote => true)