Category Archives: Web

Announcing the (Unofficial) Yahoo groups public data API

The what?

All Yahoo groups have public metadata. The number of members, the category, various email addresses etc.

Yahoo doesn’t provide an API to this publicly available data (you can see it by visiting one of the group pages). Getting information about any particular group in your programs is hard.

I’ve filled this gap by releasing a third-party API to get the publicly available Yahoo groups metadata.

JSON API

The API itself provides a really simple interface for getting group data in JSON format, just stick the urlencoded URL of the Yahoo group you are interested in on the end of the (Unofficial) Yahoo groups public data API URL and request it. You get JSON back.

The URL you request looks like this:

http://yahoo-group-data.herokuapp.com/api/v1/group/http%3A%2F%2Ftech.groups.yahoo.com%2Fgroup%2FOneStopCOBOL%2F

…and the JSON you get back looks like this:

{
    "private": false,
    "not_found": false,
    "age_restricted": false,
    "name": "OneStopCOBOL",
    "description": "OneStopCOBOL - Official COBOL group",
    "post_email": "OneStopCOBOL@yahoogroups.com",
    "subscribe_email": "OneStopCOBOL-subscribe@yahoogroups.com",
    "owner_email": "OneStopCOBOL-owner@yahoogroups.com",
    "unsubscribe_email": "OneStopCOBOL-unsubscribe@yahoogroups.com",
    "language": "English",
    "num_members": 151,
    "category": "COBOL",
    "founded": "2008-06-24"
}

You can try it out and get sample code over at the homepage of the (Unofficial) Yahoo groups public data API.

Motivation

I run the Recycling Group Finder, a site that makes extensive use of Yahoo Groups data. The (Unofficial) Yahoo groups public data API is an abstraction of the functionality I wrote to get group data for that site. I just figured it might be useful to other people.

Introducing sendcat.com

I send files to people, you send files to people. People who have computers need to send files, it’s something we have needed to do since computers were first invented. It could be easier.

Sending files is laborious. You have to open an email, address it, click ‘attach’, choose the file, put something in the subject line, put something in the body and send it. Hassle. I’m tired of doing this.

Sendcat logo

That’s why I created sendcat.com. It’s a web service that makes it *really* simple to send files. There are services out there that promise two-step sending, I’m aiming for one-step file sending.

I want to make it simple to send files, but most developers are heavy terminal users. I’m planning on providing a CLI (Command Line Interface) client as a first-class interface to the Sendcat service.

Join the beta team, get a discount

If you’re interested in the beta head over to sendcat.com and put in your email address, I’ll let you know when it’s in beta and you can sign up. I’ll make sure there is some sort of discount for people who want to keep using the service when the beta period is up, sign up to make sure you’re eligible.

Alpha

Right now Sendcat is in alpha. I’m looking for people to use the service (for free) while I work on features and bug fixes. It works though, I’m using it to send files to people daily.

If you want to try out the service for free email me. You will need a Mac OS X or Linux machine, RubyGems installed and a knowledge of the command line. The GUI client is coming soon.

Happy sharing!

Tagged ,

Should I switch from Sendgrid to Amazon SES?

Update: A new comparison with updated Sendgrid prices, and Postmark is available here.

Probably yes, at least if price is your main concern and you are just concerned with sending email and not with extras. I wanted to see just how the Amazon SES prices stacked up against (that I am aware of) the next cheapest provider, Sendgrid so I graphed it (thanks to carldr for the help with the Grapher formulas):

Cost comparison for Amazon SES/Sendgrid, click for a larger version.

SendGrid can’t be too happy with that, in short at no point is it better to go with SendGrid over SES if you are only taking price into account. Of course SendGrid have value-add over just plain email sending, you decide if it’s worth the premium, but for me the only feature I’d want would be the ‘Whitelabel’ option, and Amazon SES has that included.

Note that you get 2000 emails per day free with Amazon SES if you send from an Amazon EC2 instance, but at this scale there is very little visible difference in cost. I thought it would be useful to take into account the cost of an EC2 instance, even if you have your main server elsewhere you could run your email processing on a micro or small EC2 machine to take advantage of the 2000 free emails per day, here’s a zoom in on the origin:

Cost comparison for Amazon SES/Sendgrid + EC2 instance cost , click for a larger version.

So, there is no point in spinning up an EC2 instance to take advantage of the 2000 free emails per day.

I will be interested in SendGrid’s response to this. Possibly lowering prices? For me certainly their value-add isn’t worth the extra cost over Amazon SES.

Tagged , , ,

Fairtrade music?

My friend @sudara is a musician and is interested in changing ideas about how music can and should be created and distributed*. He has started the site Ramen Music where you can sign up for issues of new unsigned hand-picked quality music. Artists get paid fairly (~75% of the money ramen music takes, 100% for the first year).

Sharing is encouraged so here is my first issue (I have a paid subscription). With a paid account you can download high-bitrate non-DRM’ed MP3s of all the music in an issue, but he’s giving away access to the first issue to people who want it:

If anyone else wants a free taste of Ramen Music #01, follow us and we’ll DM you.

I’d be interested to know people’s thoughts and opinions on the idea, the music and the state of the Music industry in general.

* The state of the Music industry is a debate I’ve had with a bunch of people in the past, and it seems independent and new musicians are not served well by the current models.

Tagged ,

Generating a plist file in rails

I recently wrote an iPhone app (Waiting for approval in the app store at the time of writing) that needed data exported from a website (recyclinggroupfinder.com). The simplest way of handling external data in an app it seems is using a plist file, so I wrote this to generate one for me.

First of all I made my action respond to the plist format:

Next I created a builder file to format the data:

Then register the MIME type at the bottom of environment.rb:

Mime::Type.register "text/plist", :plist

And that’s it! Well mostly. The XML file generated can be made significantly smaller by converting it into the binary plist format, run this on the command line in terminal after downloading the generated XML plist.

cat things_xml.plist | plutil -convert binary1 - -o things.plist

The resultant binary plist is almost half the size of the XML one, much better for inclusion in an iPhone app:

pleb:~ will$ ls -l things*
-rw-r--r-- 1 will will 1247300 20 Jan 18:50 things.plist
-rw-r--r--@ 1 will will 2110437 20 Jan 18:50 things_xml.plist

Of course it would be much better to generate the binary format directly, and the plist-official gem looks like it can handle that and I mean to investigate, but I wrote the XML version before finding the gem, and it works for me!

Edit: the plist_official gem seems to be gone, but check out the binary plist gem instead.

Tagged , ,

Whenever a link on your website opens in a new window a panda cries

You’ve got a great website. It’s amazing. It’s so good no-one will want to leave it. Ever. Here’s what you’re thinking:

OMG wow. Our website is amazing. It’s so good no-one will want to leave it. Ever. Let’s help users enjoy our website forever by making all external links open in new windows so then they close the other websites our site will still be open. Our users will thank us until the end of time for making it easier to stay on our site, and anyway Marketing said we had to do it and they know the internet better than anyone!

Sad Panda

Oh dear. Most people don’t know this, but making external website links open in a new window makes pandas sad. Look, here’s a sad panda made sad because it used a website that opened external links in new windows.

Sad panda

Doesn't this panda look sad?

You did that, with your new window link opening. (Panda by sholt).

Why Pandas cry

You need to consider that your website is going to be just one part of a users browsing session. The user will probably already have open tabs in their current browser window and the tab they have your website in will probably have history before your site. When you force a new window to open for a user you are interrupting their browsing flow. When this happens the user has a jarring user experience because of your website. Well done your website.

There are already controls in browsers to let users open links in new windows or tabs, in Safari they are the the first two options in the right-click context menu, or a Cmd+click:

Browser controls already exist giving the user control over where links open

When you force the user to open links from your website in a new window you are taking away control the user already has.

It’s a PITA and I have to work around it

Here’s what I personally do when your website opens a link in a new window:

  1. Your website forces new window to open when I click on a link.
  2. New window opens, I close it immediately.
  3. On your website again I Cmd+click the link or right click and select ‘open in new tab’.
  4. I close the tab your website was in and re-position the new tab with the new website in where the tab for your website used to be.
  5. I mentally remove one karma point from your website in my internal website excellence tracker.

Look at the amount of messing around your website made me do. And now, because of this messing around, your website is no-longer accessible via my browser back button. You’ve succeeded in making your website even less accessible, the exact opposite of that you were trying to achieve.

Luckily I’m mentally tough much like Chuck Norris and so can take this two, maybe three times before cracking, but Pandas aren’t as tough as me. If this happened to a panda the panda would just cry. Sad.

Freecycle and Freegle group location data as KML

If you’ve got Google Earth (or something else that can read KML data) you might want to play around with the new recycling group location kml file I’ve put live on the Recycling Group Finder. More information here.

Recycling group data with population density overlay

Recycling group data with population density overlay

Tagged , ,

Optimising the Recycling Group Finder – Making a Ruby on Rails app faster

This is really just the ‘story’ of how I fixed a very slight performance issue with the Recycling Group Finder site that I run, but I figured it would be worth a post as an example or motivation to anyone else who needs to get started investigating their own Ruby on Rails app performance issues.

The performance problem

I’ve been very happy with the responsiveness of the Recycling Group Finder, so just out of interest, just to see what it would tell me, I installed the NewRelic RPM plugin and activated the free Bronze account available to EngineYard customers. The results were pretty satisfying as my average response time for the most popular page was 163ms maximum with the second most popular page at 90ms. Those are good response times and fall well within the 37signals response time rule:

Our general rule of thumb is that most pages should render their HTML on the server in less than 200ms and almost all in less than 500ms.

Suspicious looking

One of the great things about data visualisations is it can make it really easy to spot patterns. Take this New Relic graph for example:

Recycling Group Finder - graph, before optimisation

Recycling Group Finder - before optimisation

The yellow on the graph represents time spent in the database, the blue is time spent in Ruby, ie. rendering, controllers etc. Memcached accesses are on there too but they’re so fast they hardly appear. This graph looked suspicious to me, I’d normally expect database time to be a much smaller proportion of the overall request time. So it looks like there may be some optimisation that can be done, but in order to optimise I first need to know what to optimise.

The hunt

Google for “rules of optimisation“. Most rules are something like this:

  1. Don’t optimise yet
  2. If you need to optimise, profile first.

I’m never going to be able to optimise my code unless I know what to optimise. If I trawl through looking for places that might be slow and trying to make them faster the chances are I’m going to spend hours changing code for no benefit. I might even make it slower. I need to know exactly where the bottleneck is, I need to profile my code.

There are a bunch of ways of finding out where your code is slow and I’ve personally used ruby-prof before with good results. However I know that the issue here is in the database, and I know that Rack::Bug will show me SQL queries that have run for an action, and importantly how long they took, so that’s what I’m going to try first. I install the plugin, configure it and load it up. The issue is immediately obvious:

Recycling Group Finder - Rack::Bug SQL queries

Recycling Group Finder - Rack::Bug SQL queries

Almost all of the SQL that is executed is under 0.5ms per query, there are a few queries at ~4ms but he one query that really stands out is the third one down. At 44.75ms it is more than half of the overall SQL time. Bingo! Now I know what is slow I need to know why it is slow. Time to break out the query analyser.

Fixing it

I needed to dig deeper into that SQL statement to see what it was doing, so I opened up a postgres shell and ran an explain analyse on the query:

The issue seems pretty clear. There is a Sequential scan on groups:

Seq Scan on groups (cost=0.00..626.75 rows=4885 width=363) (actual time=0.038..26.495 rows=5126 loops=1)

A Sequential scan on a large table is going to sink performance. I can see that the sequential scan is definitely the issue in this case as the cost and time taken are significant proportions of the overall query time. I need to eliminate it. Here’s the code that generates that query:

@groups = Group.find(:all, :include => :group_page, :origin => [@location.lat, @location.lng], :limit => 30, :conditions => ["defunct = false AND lat is not null and lng is not null and full_address is not null and full_address != '' and country_code = ?", @location.country_code], :order => 'distance ASC, num_members DESC')

I wrote this code ages ago and re-reading it now I can see that although I am limiting the returned results to 30 rows the query will have to hit every row in the table to determine which rows are in the returned 30 as there are no conditions to the query. Whoops. Looking over the Geokit docs I see there’s a :within condition so I added a :within => 100 to the find. Testing the resultant query in the postgres shell using explain analyse again and the query has dropped to 10ms. Not bad but it’s still using a sequential scan. Adding an index on the conditions speeds up the query further to ~1.2ms:

Not bad when it started out at nearly 45ms. Here is the result reflected in the New Relic graph:

Recycling Group Finder - After optimisation

Recycling Group Finder - after optimisation

I deployed the new code approximately in the middle of the graph, it should be pretty obvious where.

Conclusion

Before you can optimise your Ruby on Rails app (or your app in any other framework/language for that matter) you need to know know where to optimise. Tools like Rack::Bug and NewRelic allow you to do this effectively and easily allowing you to direct your attention only on those parts of your app that need the attention.

On the Recycling Group Finder I cut response times drastically in about half an hour. Without knowing exactly where to make the change I would have been left guessing and may never have made the optimisation I did.

Looking for a Web-Development job? Learn Ruby and Ruby on Rails

Seriously. Not only will you be able to develop web-applications faster and with more joy, but if you fill some of the many Ruby on Rails job vacancies there are going the recruiters might stop bugging me so often.

There are Ruby on Rails jobs out there

Or that’s what is seems like from talking to people at the NWRUG and Geekup meetings I go to and by the phone calls I get from recruitment agents. I know of companies worried about using Ruby and Rails because of concerns over the number of developers available. These companies need you and they need you to write web applications for them in Ruby on Rails! These are companies who want to use Ruby on Rails and they will hire you if you learn it.

I’m fine sticking with $some_other_language but thanks anyway

That’s fine, there are lots of jobs available using your programming language. Well, maybe not if that language is Coldfusion. But if you expand your horizons, teach yourself something new and can prove to others that you’re interested in and capable of learning then you’re going to be a more valuable asset. That’s going to translate into more pay and a more fulfilling job using a language as expressive as Ruby and a framework as labor-saving as Ruby on Rails.

Worst case scenario is that you learn Ruby on Rails and you can write your own web-apps a whole lot faster (you do write your own web-apps right?), your CV looks better and you have more time for the dull stuff that you fit around programming. Watching Buffy or something. You know, programmer social life stuff.

I tried Ruby and Ruby on Rails already but I prefer Python…

Weirdo.

I tried Ruby and Ruby on Rails already but I prefer Cobol!

You don’t exist, go away.

You were thoroughly convincing, I’m sold

This post is so convincing that when I proof read it I nearly went and learned ruby on Rails myself, even though I already know it. If you want to you learn you can start here, and there’s going to be a local Ruby user in your area somewhere, sign up to their mailing list, we’re a pretty helpful bunch.

If you’re anywhere near Manchester, UK then come along to the next NWRUG meeting, it’s this Thursday and there’s free pizza. You need to sign up to attend.

Protecting yourself against the WordPress login page exploit

Anyone that runs a wordpress blog will hopefully be aware of the recent exploit against the login page:

“You can abuse the password reset function, and bypass the first step and
then reset the admin password…”

and

“An attacker could exploit this vulnerability to compromise the admin
account of any wordpress/wordpress-mu <= 2.8.3″

There’s no fix in any released version yet but you can protect yourself with a bit of Apache config until one is released. Just add this to your wordpress virtualhost replacing “you.re.ip.add” with the IP address you want to access the login page from:

<Location /wp-login.php>
Order deny,allow
Deny from all
Allow from you.re.ip.add
</Location>

This will present any user not accessing your login page form that IP with a 403 Forbidden error. If you want to block all IPs until a fix comes out just miss out the Allow line:

<Location /wp-login.php>
Order deny,allow
Deny from all
</Location>

Tagged , ,
Follow

Get every new post delivered to your Inbox.