Category Archives: finder

Announcing the (Unofficial) Yahoo groups public data API

The what?

All Yahoo groups have public metadata. The number of members, the category, various email addresses etc.

Yahoo doesn’t provide an API to this publicly available data (you can see it by visiting one of the group pages). Getting information about any particular group in your programs is hard.

I’ve filled this gap by releasing a third-party API to get the publicly available Yahoo groups metadata.

JSON API

The API itself provides a really simple interface for getting group data in JSON format, just stick the urlencoded URL of the Yahoo group you are interested in on the end of the (Unofficial) Yahoo groups public data API URL and request it. You get JSON back.

The URL you request looks like this:

http://yahoo-group-data.herokuapp.com/api/v1/group/http%3A%2F%2Ftech.groups.yahoo.com%2Fgroup%2FOneStopCOBOL%2F

…and the JSON you get back looks like this:

{
    "private": false,
    "not_found": false,
    "age_restricted": false,
    "name": "OneStopCOBOL",
    "description": "OneStopCOBOL - Official COBOL group",
    "post_email": "OneStopCOBOL@yahoogroups.com",
    "subscribe_email": "OneStopCOBOL-subscribe@yahoogroups.com",
    "owner_email": "OneStopCOBOL-owner@yahoogroups.com",
    "unsubscribe_email": "OneStopCOBOL-unsubscribe@yahoogroups.com",
    "language": "English",
    "num_members": 151,
    "category": "COBOL",
    "founded": "2008-06-24"
}

You can try it out and get sample code over at the homepage of the (Unofficial) Yahoo groups public data API.

Motivation

I run the Recycling Group Finder, a site that makes extensive use of Yahoo Groups data. The (Unofficial) Yahoo groups public data API is an abstraction of the functionality I wrote to get group data for that site. I just figured it might be useful to other people.

Rails 2 -> 3 undefined method `html_safe’ for nil:NilClass error

I am converting the Recycling Group Finder site from Rails 2 to Rails 3 and though it has mostly gone to plan I was temporarily held up by this error which I was getting on some pages:

The error was hard to track down as the error message wasn’t very descriptive, but in the end it turned out to be caused by a comment. I am using content_for blocks to generate sections of page content and for a long section I had added a comment to the end of the block to help be know which block was closing:

It turns out that this ‘# some_section’ comment was the problem, possibly because of the change to erubis in Rails 3. Removing the comment caused the page to start working again:

I hope this page helps short-cut the debugging for anyone else that is bitten by this issue.

Tagged , ,

Recycling Group Finder iPhone app – updates

I’ve pushed a new release of the Recycling Group Finder iPhone app that updates the group database, there should be a lot more Freegle groups available now. Get it from the app store.

Tagged , ,

Freecycle and Freegle group finder for the iPhone

Recycling Group Finder app for the iPhone

I’ve launched my first iPhone app, the Recycling Group Finder for iPhone. It complements the Recycling Group Finder web app making it even easier to find your closest Freecycle or Freegle group by using the iPhone’s in-built GPS. Check out the information page and give it a go, it’s free.

Tagged ,

Freecycle and Freegle group location data as KML

If you’ve got Google Earth (or something else that can read KML data) you might want to play around with the new recycling group location kml file I’ve put live on the Recycling Group Finder. More information here.

Recycling group data with population density overlay

Recycling group data with population density overlay

Tagged , ,

Optimising the Recycling Group Finder – Making a Ruby on Rails app faster

This is really just the ‘story’ of how I fixed a very slight performance issue with the Recycling Group Finder site that I run, but I figured it would be worth a post as an example or motivation to anyone else who needs to get started investigating their own Ruby on Rails app performance issues.

The performance problem

I’ve been very happy with the responsiveness of the Recycling Group Finder, so just out of interest, just to see what it would tell me, I installed the NewRelic RPM plugin and activated the free Bronze account available to EngineYard customers. The results were pretty satisfying as my average response time for the most popular page was 163ms maximum with the second most popular page at 90ms. Those are good response times and fall well within the 37signals response time rule:

Our general rule of thumb is that most pages should render their HTML on the server in less than 200ms and almost all in less than 500ms.

Suspicious looking

One of the great things about data visualisations is it can make it really easy to spot patterns. Take this New Relic graph for example:

Recycling Group Finder - graph, before optimisation

Recycling Group Finder - before optimisation

The yellow on the graph represents time spent in the database, the blue is time spent in Ruby, ie. rendering, controllers etc. Memcached accesses are on there too but they’re so fast they hardly appear. This graph looked suspicious to me, I’d normally expect database time to be a much smaller proportion of the overall request time. So it looks like there may be some optimisation that can be done, but in order to optimise I first need to know what to optimise.

The hunt

Google for “rules of optimisation“. Most rules are something like this:

  1. Don’t optimise yet
  2. If you need to optimise, profile first.

I’m never going to be able to optimise my code unless I know what to optimise. If I trawl through looking for places that might be slow and trying to make them faster the chances are I’m going to spend hours changing code for no benefit. I might even make it slower. I need to know exactly where the bottleneck is, I need to profile my code.

There are a bunch of ways of finding out where your code is slow and I’ve personally used ruby-prof before with good results. However I know that the issue here is in the database, and I know that Rack::Bug will show me SQL queries that have run for an action, and importantly how long they took, so that’s what I’m going to try first. I install the plugin, configure it and load it up. The issue is immediately obvious:

Recycling Group Finder - Rack::Bug SQL queries

Recycling Group Finder - Rack::Bug SQL queries

Almost all of the SQL that is executed is under 0.5ms per query, there are a few queries at ~4ms but he one query that really stands out is the third one down. At 44.75ms it is more than half of the overall SQL time. Bingo! Now I know what is slow I need to know why it is slow. Time to break out the query analyser.

Fixing it

I needed to dig deeper into that SQL statement to see what it was doing, so I opened up a postgres shell and ran an explain analyse on the query:

The issue seems pretty clear. There is a Sequential scan on groups:

Seq Scan on groups (cost=0.00..626.75 rows=4885 width=363) (actual time=0.038..26.495 rows=5126 loops=1)

A Sequential scan on a large table is going to sink performance. I can see that the sequential scan is definitely the issue in this case as the cost and time taken are significant proportions of the overall query time. I need to eliminate it. Here’s the code that generates that query:

@groups = Group.find(:all, :include => :group_page, :origin => [@location.lat, @location.lng], :limit => 30, :conditions => ["defunct = false AND lat is not null and lng is not null and full_address is not null and full_address != '' and country_code = ?", @location.country_code], :order => 'distance ASC, num_members DESC')

I wrote this code ages ago and re-reading it now I can see that although I am limiting the returned results to 30 rows the query will have to hit every row in the table to determine which rows are in the returned 30 as there are no conditions to the query. Whoops. Looking over the Geokit docs I see there’s a :within condition so I added a :within => 100 to the find. Testing the resultant query in the postgres shell using explain analyse again and the query has dropped to 10ms. Not bad but it’s still using a sequential scan. Adding an index on the conditions speeds up the query further to ~1.2ms:

Not bad when it started out at nearly 45ms. Here is the result reflected in the New Relic graph:

Recycling Group Finder - After optimisation

Recycling Group Finder - after optimisation

I deployed the new code approximately in the middle of the graph, it should be pretty obvious where.

Conclusion

Before you can optimise your Ruby on Rails app (or your app in any other framework/language for that matter) you need to know know where to optimise. Tools like Rack::Bug and NewRelic allow you to do this effectively and easily allowing you to direct your attention only on those parts of your app that need the attention.

On the Recycling Group Finder I cut response times drastically in about half an hour. Without knowing exactly where to make the change I would have been left guessing and may never have made the optimisation I did.

Google Groups now supported on the Recycling Group Finder

I have just pushed live the latest feature of the Recycling Group Finder, support for Google Groups!

It has taken a while to roll out, the changes to the code-base that runs the site were pretty fundamental as the site was originally only written to work with Yahoo based groups. As a result there may be some issues, let me know if you see any.

To get started just email me your Google Groups recycling group URL or head over to the group addition page.

Bot whipping

I finally got round to doing something about the Alexa crawler tampering with the cookies on the Recycling Group finder after Patrick Joyce commented on my previous post.

There’s no reason for me to store information in the cookies for bots visiting my site so I just disabled them for the Alexa crawler (any request where the user agent string matches ia_archiver) by adding a single line to my ApplicationController class:

It wasn’t a major pain in the arse, but it’s a few less emails to delete every day!

finder.overcycle.com updates

Thanks to all the people who emailed about the release of the Recycling Group Finder, it was great to receive so many positive comments!

Due to popular request there are two new features. First, group member numbers are automatically updating. This takes a maximum of about 48 hours or so to update, so don’t worry if your membership numbers have changed and the new figure isn’t appearing on the site, it will.

Second is the group owner/moderator admin section. If you are a group moderator you can now signup to edit details of your group including member numbers, name and location. To start just enter your yahoo group URL, or find your group on the site and follow the link included with the rest of the group information.

Comments and feedback welcome as always!

Follow

Get every new post delivered to your Inbox.