Archive for November, 2008

Weekly Digest, 11-30-08

Posted by Weekly Digest in Weekly Digest on November 30, 2008

Please find the attached interesting links for this week as provided by Trevor and Tim:

This Week in Edge Rails

it certainly hasn’t looked like a holiday week in edge Rails. Things are moving fast, with some major changes afoot for version 2.3 of Rails.

Ta-da List on Rails 2.2, Passenger And EC2

If you haven’t documented your server deployment process in code or experimented with these technologies, now is the time.

Slicing Your Attributes

ActiveRecord models default to having all attributes assignable this way. As a result, unless you're very careful with attr_protected and attr_accessible, there's a good chance your app has security holes.

Favicon Hell: Small Feature, Big Code

The end result is that it took thousands of lines of code just to display favicons. And that’s often the case with features that seem simple at first glance. It’s not until you dive into the code and find all the weird problems and bugs that you realize your little feature is actually a big PITA.

Warren Buffett's 10 Ways To Get Rich

When you get to my age, you'll measure your success in life by how many of the people you want to have love you actually do love you. That's the ultimate test of how you've lived your life.

Ask Hacker News: Does the SaaS model really work?

I'm working on a web startup with a partner and I'm just feeling unsure of whether whole SaaS thing really works.

Pair Programming - Marketing FAIL

My modest proposal is to stop calling it Pair Programming or Extreme Programming. At this point, that is like calling your new energy company Enron. I propose calling it Collaborative Development.

Notes from the Ruby Manor

I’m lucky enough to be at RubyManor today; a Ruby conference organised by Ruby users, for Ruby users.

Cloud computing is a sea change

How sysadmins can prepare ...don’t be shy, embrace the cloud. If you’re a UNIX sysadmin you already have the right stuff to succeed in this new world of utility on-demand computing...

Dumbing Down the Cloud

I trust Dropbox. Here’s why.

Turning Ideas Into iPhone Applications

No one wants to work for equity or the promise of future returns for someone else right now. There is too much cash work out there. The developers willing to take risks on future returns would rather do this for their own application projects. That is a risk worth taking.

Refactoring Rails Controllers

One of the basic memes of Rails is "Skinny controller, fat model." And even though most examples, especially the ones using the newer REST features in Rails advertise it, there are still heaps of controllers out there that have grown too big, in both the sense of lines of code and the number of actions.

Seven Rules for Building Online Portfolios

Your site is a frame. Make every project a link. Make it simple. Don’t be clever. Make it easy for us to contact you. Make it easy to update. SIMPLIFY!

iPhone GUI PSD

Over the past few months we’ve had to create a few iPhone mock ups for presentations. Since we know we’ll be doing more of this, we created our own Photoshop file that has a fairly comprehensive library of assets.

TaskPaper

Today’s task managers have evolved into complex database-like applications, TaskPaper provides an alternative that harkens back to simpler (and faster) times.

Daring Fireball: Treating URL Protocol Schemes as Cruft

...it’s always struck me as somewhat ungraceful that we spend all day staring at dozens of URLs that all start with the same repetitive prefix. [If you're going to hide http://, why wouldn't you also hide the equally useless www?]

On App Store pricing

People have always been willing to pay money for valuable software, and users of the iPhone platform are no different. It’s not some crazy new voodoo platform where nobody will pay for anything. Treat it like any other software market, and you’ll see that it responds in the same way.

Schneier on Security: The Future of Ephemeral Conversation

Full disclosure: security expert and cryptography ace Bruce Schneier is a personal hero of mine. That having been said, I feel like I must also now say that I would only recommend a particular essay of his to other people if I felt like it was genuinely worthwhile. This piece deals with so-called "digital ephemera" (the casual conversations you have on-line) and the generation gap that exists between my generation ("baby busters" or "millenials") and everyone from the GenX and previous set. Definitely a must-read for anyone whose life has been as copiously documented as yours certainly has been if you are reading this.

Geek to Live: Wget local copies of your online research (del.icio.us, digg or Google Notebook)

I'm going to go meta here for a second with a link to a lifehacker.com article about how to archive delicious links and diggs. Basically, this describes the wget flags you want to use if you're trying to keep a local archive of someone's delicious links: handy for automating things like, for example, a weekly digest of your del.icio.us links

Raising Protected Attribute Assignment Errors

Posted by Trevor in Ruby/Rails on November 27, 2008

While mass-assignment in Rails can be convenient for developers, it can pose a security risk if the implications aren't understood. An article on Rails Spikes does a good job of explaining the issue:

By default ActiveRecord allows visitors access to any writer method, that is, any method ending with an equal sign. This comes courtesy of the ActiveRecord::Base#attributes= method, which is used internally by the main methods that handle creating and updating records, including new(), create(), and update_attributes().

The way most applications are designed means that whatever data a visitor sends to the server will likely find its way through the attributes=() method, and if not protected, ActiveRecord will happily update the records based on what was sent. In less technical terms: ActiveRecord is insecure by default.

I suggest reading over that article, even if you're familiar with the potential issues around mass-assignment. There's also a Railscasts episode on the subject.

The solution proposed is to use attr_accessible in all of your models. This way, you have to explicitly make attributes accessible to users, which is generally a good thing. However, this strategy introduces a small "gotcha" that's bitten me a few times.

When you're in development and try to mass-assign a protected attribute, it will fail silently, leaving only a note in the debug log. I don't know about you, but I very rarely look at the debug log, and I've found myself temporarily stumped when attributes weren't being assigned as expected. Of course, I'm getting better about remembering to add attributes via attr_accessible after being bitten by this one a few times, but perhaps others have been confounded by this gotcha as well?

Well, thanks to a small change in Active Record (more detail here), it's now possible to give yourself a more noticeable warning when your testing your application. Simply add the following initializer, and your tests will complain much more loudly if you try to mass-assign a protected attribute.

 
# config/initializers/noisy_protected_attribute_removal.rb
if Rails.env.test?
  ActiveRecord::Base.class_eval do
    def log_protected_attribute_removal(*attributes)
      raise "Can't mass-assign these protected attributes: #{attributes.join(', ')}"
    end
  end
end
 

This little trick has saved me some head-scratching already. Perhaps you'll find it useful as well.

Weekly Digest, 11-23-08

Posted by Weekly Digest in Weekly Digest on November 23, 2008

I've very pleased to welcome our good friend Tim to the Weekly Digest. I'm sure you'll enjoy his contributions. You can follow along with his delicious account, if you're into that sort of thing. We've also managed to inspire another weekly collection of links that you may be interested in checking out. Nick will be posting links for creatives and creators over on his blog, greyscalegorilla.

So, without further ado, please join me in welcoming Tim's links:

tarsnap.com

If you can stomach the idea of someone else storing your data on his computers and are running out of disc, tarsnap might be the solution for you. Basically, you set up your machine to dump your data to the remote site, it is encrypted and then snapshots are taken (presumably at the normal rsnapshot intervals) and you can get at them whenever you need the data. The key here is encryption: there are dozens of places to stick your data, but the fact that you dump via a secure tunnel and that the site's proprietor never has any access to your data makes this solution a viable one. If I wasn't an incurable tin-foil-hat about privacy, I would definitely consider tarsnap.

Pixelpipe - Free your media, upload and share anywhere

Like a lot of G1 applications that are either overt ports of iPhone software or obviously inspired by iPhone software, this one is still coming together. The big idea here is that this is a Web 2.0 labor-saver that automatically dumps your G1 pictures into your Facebook, Myspace, etc. account: I wouldn't say it's fully automatic yet, but it's getting there. I tried to link it up to my picasa account on day one and, after some finagling, managed to get it working. Ideally, it would "just work" and you could download and set this up without your home/laptop computer on a moving train. And, as I say, Pixelpipe isn't quite there yet, but it's got potential.

Write or Die : Dr Wicked's Writing Lab

This is a tool, designed with creatives in mind (i.e. not coders), to encourage copywriters to shake their tail-feathers. The basic idea is that you tell it how many words you want to write and how much time you want to allow yourself to write those words and then it gives /you/ a text input window and a ticking clock. As the clock runs down, the text you enter is deleted if you stop typing. This definitely works just like the essay writing portion of a standardized test: you don't waste too much time tangentializing during the pre-writing/brain-storming processes and when you get down to it, you eyes don't drift backward to check for continuity/grammar errors or typos.

Noodle Soup Oracle

The so-called "noodle soup oracle" was "created by Michele Humes and Joshua Sierles after a meal of shrimp roe noodles in miso, topped with spicy carnitas, shelled edamame and chopped scallions"; at the click of a button, it will randomly choose a noodle, two savory additional ingredients and a sauce. It is also capable of suggesting noodle dishes bereft of meat. Indispensible for anyone who enjoys a.) the consumption of noodles and b.) letting a little chaos into his life from time to time.

Clonezilla

I haven't had a chance to play this in an actual production environment, but it seems like a dream come true to anyone who has had to deploy workstations based on a common image: basically, in clonezilla you get a F/OSS alternative to Acronis/Norton Ghost that works a lot better than ghost4unix and has a some nice, time-saving user-friendliness features.

http://mysqltuner.com/mysqltuner.pl

This is a handy little diagnostic tool that, while utterly useless if you're not familiar with the details of your /etc/mysql/my.cnf, takes the guesswork out of gathering a number of important usage and performance statistics.

...and now the set from yours truly:

CouchDB is now officially Apache CouchDB

CouchDB has graduated from incubator to a top level project.

This Week in Edge Rails

...some pent-up code has been checked in, and some big changes are being made. It’s an exciting time, and edge is definitely worth checking out.

Delayed Gratification with Rails

Daniel Morrison, of Collective Idea, is the first and will be showing a few ways he has used delayed job to offload tasks to the background. Without any further ado, here is Dan.

Base for SQLite3

Base is an application for creating, designing, editing and browsing SQLite 3 database files. It's a proper Mac OS X application. Fast to launch, quick to get in to and get the data you need.

A demo of some thoughtful UI on Ffffound.com

Keyboard shortcuts FTW.

Build a Killer Online Portfolio in 9 Easy Steps

Ask yourself: how well does my site answer the questions potential clients are likely to have?

Carpal Tunnel Syndrome Fact Sheet

How can carpal tunnel syndrome be prevented? ...on-the-job conditioning, perform stretching exercises, take frequent rest breaks, wear splints to keep wrists straight, and use correct posture and wrist position. Wearing fingerless gloves can help keep hands warm and flexible. Workstations, tools and tool handles, and tasks can be redesigned to enable the worker's wrist to maintain a natural position during work.

Kvetch! Let it out, baby.

A Kvetch is a funny complaint. This site randomly displays kvetches sent via Twitter.

Rails Rumble Observations

Trends in gem/plugin usage. Winners include JQuery, Bort, Mocha, Hoptoad, Thinking Sphinx, Paperclip, and Restful Authentication.

Why the Drudge Report is one of the best designed sites on the web

Your eye darts all over the place looking around for something that looks interesting. The design encourages wandering and random discovery. The site feels like a chaotic newsroom with the cutting room floor exposed.

The Fast, Good and Cheap Pricing Method

Have you ever heard of the Fast, Good, Cheap pricing method?
The idea is that clients should only be able to choose 2 of these 3 words, and you have to keep this in mind when pricing your next job.

Custom fields? We don't need no stinking custom fields

I think most people underestimate the impact of personal communication and overestimate the value of technology. Bug trackers are a good thing, but only if they function with your team.

Ruby on Rack #1 - Hello Rack!

Rack was initially inspired from pythons’s wsgi and it quickly became the de-facto web application/server interface in the ruby community, thanks to it’s simplicity and preciseness.

Rails meets Sinatra #2 - Mix n' Match

Put sinatra code in any of your regular Rails controllers. No need to mount at Sinatra at a specific URI. Have Sinatra work for any URI, gracefully fallback to Rails if no Sinatra method matches the path. Use your models/libraries etc. in both Rails and Sinatra.

Four Years of Ruby Development Notes

This talk is from Ezra Zygmuntowicz from Engine Yard. He’s going to go over his history working and deploying with Rails. Interesting notes on upcoming Engine Yard offerings. Comments on Passenger.

Shared memory and Ruby Enterprise Edition

A clarification of the way memory optimization works in the specialized version of Ruby developed for Phusion's Passenger.

HappyMapper, Making XML Fun Again

A while back, you may remember, I posted about ROXML, a ruby object to xml mapping library. I liked the idea but not the implementation. Soon after, I started playing around with what I have named HappyMapper, a ruby object to xml mapping library.

Say Goodbye to BlackBerry? If Obama Has to, Yes He Can

Mr. Obama, however, seems intent on pulling the office at least partly into the 21st century on that score; aides said he hopes to have a laptop computer on his desk in the Oval Office, making him the first American president to do so.

Rails in the Cloud: AWS, Heroku, and Morph

It remains to be seen whether either Heroku or Morph remain good options for us as our application grows (the fact that neither support true background tasks or Memcached servers might become a limiting factor at some point), but if nothing else they’re an ideal way to get off the ground.

Concurrency is a Myth in Ruby

The implications of the GIL are surprising at first, but it turns out the solution to this problem is not all that complex: instead of thinking in threads, think how you could split the workload between different processes. Partition the work, or decompose your application. Add a communications / work queue. Fork, or run multiple instances of you application.

Net Neutrality Advocates In Charge Of Obama Team Review of FCC

Both are highly-regarded outside-the-Beltway experts in telecom policy, and they've both been pretty harsh critics of the Bush administration's telecom policies in the past year.

Deferring Tests with Test::Unit in Rails

Posted by Trevor in Ruby/Rails on November 20, 2008

Now that we have that nice syntax for tests in Rails, I'm happy just using the baked-in Test::Unit stuff. Well... maybe I still need Mocha. But the other stuff like RSpec, test/spec, and Shoulda? Meh. The only thing missing from Test::Unit is an easy way to defer tests. That's important. I'd been dropping "flunk" in tests to note that they weren't implemented yet, but that can get confusing pretty quickly.

Luckily, there's a quick and easy way to add "deferred" tests. Here's how:

 
# test/test_helper.rb
class Test::Unit::TestCase
 
  def defer
    puts "nDeferred: #{caller[0]}"
  end
 
end
 
# test/functional/home_controller_test.rb
require 'test_helper'
 
class HomeControllerTest < ActionController::TestCase
  test "should defer test" do
    defer; return;
  end
end
 

This would produce output like so...

~/git/h8ter $ autotest
loading autotest/rails
/opt/local/bin/ruby -I.:lib:test -rtest/unit -e "%w[test/functional/home_controller_test.rb...
Loaded suite -e
Started
....................
Deferred: ./test/functional/home_controller_test.rb:6:in `test_should_defer_test'
............................................
Finished in 0.795139 seconds.

64 tests, 123 assertions, 0 failures, 0 errors

I thought I'd seen a commit from Koz that added a nice way to defer tests in Rails, but I can't seem to find it. Please post a comment if you know what I'm talking about. In the meantime, here we are with a quick and dirty solution for your enjoyment.

Stupid Linux Tricks: avoid unnecessary system calls with /proc/net/arp

Posted by Timothy O'Connell in General on November 18, 2008

/proc is, for anyone interested in what's happening under the hood, an endless source of awesomeness. I recently experienced a bit of /proc awesomeness when sitting down to refactor an old VPN monitoring project using python.

Long story short, OpenVPN (http://openvpn.net/) prints a log called "openvpn-status.log" at fixed intervals. Contained within this log are the MAC addresses and common names of the VPN clients currently connected to the VPN server. The goal of the project was to take that log, swap the MAC addresses for local IP addresses and reprint the log on an intranet webpage.

The old way of doing this involved pinging all of the IP addresses on our subnet within the range of addresses that we had set aside for VPN clients and then scraping the output from # /usr/bin/arp -a to assign MAC addresses to IP addresses. This was alright, but in my program it required making a system (or, more specifically, a subprocess) call. Not optimal.

The new way of doing this is via the new awesomeness: /proc/net/arp. The setup is the same: you ping all the IP addresses in the range alotted for VPN clients to fill the arp cache but then, instead of working with the output from the arp program, you simply create a file-like object /proc/net/arp and work with that much simpler, easy to manipulate output.

To illustrate the difference, allow me to cut and paste both outputs. /usr/bin/arp -a first:

virginia:/var/log/openvpn# arp -v -a
? (192.168.2.89) at  on br0
? (192.168.2.85) at  on br0
? (192.168.2.82) at 00:FF:49:56:76:30 [ether] on br0
? (192.168.2.87) at  on br0
? (192.168.2.83) at 00:FF:2D:53:53:20 [ether] on br0

Now take a gander at what /proc/net/arp spits out:

virginia:/var/log/openvpn# cat /proc/net/arp
IP address       HW type     Flags       HW address            Mask     Device
192.168.2.85     0x1         0x0         00:00:00:00:00:00     *        br0
192.168.2.89     0x1         0x0         00:00:00:00:00:00     *        br0
192.168.2.82     0x1         0x2         00:FF:49:56:76:30     *        br0
192.168.2.87     0x1         0x0         00:00:00:00:00:00     *        br0
192.168.2.83     0x1         0x2         00:FF:2D:53:53:20     *        br0

Basically the same thing, but with /proc/net/arp you've got the option, if you're working with python, of simply creating a file-like object and working with that (instead of making a system call and having to strip off those parentheses):

f = file("/proc/net/arp")

From there, a quick list comprehension will get you all the data you need in a single, easy-to-use list:

output = [line.strip().split() for line in f.readlines()]

Two lines of code and you're ready to throw the data you need into a dictionary and do whatever it is you need to do.

Finally, the awesomeness isn't limited to python: when I'm trying to track down switches or other devices on the network that might not necessarily want to be found, all I've got to do to is rattle off the following bas one-liner and I've got all the data I need to start poking around:

cat /proc/net/arp |awk '{print $1 " " $4}'

Weekly Digest, First Edition

Posted by Weekly Digest in Weekly Digest on November 15, 2008

This is the first of what I hope will become a regular feature around here; a weekly digest of interesting links. Since this is the first edition, I've gone back into the archives a bit and pulled out some of my favorites from the last month or so.

As a bonus, I thought I'd share the simple little Ruby script I'm using to pull this thing together with the quickness. It's using the fantastic httparty from John Nunemaker. If you save that as delicious.rb and run it (perhaps via command-R in TextMate), you'll get ready-made output for your blog.

 
# http://github.com/jnunemaker/httparty/
require 'rubygems'
require 'httparty'
 
config = YAML::load(File.read(File.join(ENV['HOME'], '.delicious')))
# For this to work, simply put a .delicious file in your home directory that looks like this:
# username: example
# password: whatever
 
class Delicious
  include HTTParty
  base_uri 'https://api.del.icio.us/v1'
 
  def initialize(u, p)
    @auth = {:username => u, :password => p}
  end
 
  def recent(options={})
    options.merge!({:basic_auth => @auth})
    self.class.get('/posts/recent', options)
  end
end
 
delicious = Delicious.new(config['username'], config['password'])
links = delicious.recent(:query => {:count => '100'})['posts']['post']
 
links.each do |l|
  puts "<a href="#{l['href']}">#{l['description']}</a>"
  puts "#{l['extended']}"
end
 

If you're interested in getting a steady stream of links, you can subscribe to my delicious feed.

A trick that I've recently picked up is adding people to my network and then subscribing to my network's feed. This way, I have a nice collection of links all bundled together from people I'm interested in. The delicious network is a highly underrated tool, in my opinion. Perhaps its value just isn't obvious enough. It's not really promoted well, but if you think about it, the delicious network is basically a twitter for links. I feel like you could rebrand it, get some VC money, and TechCrunch would eat it up :)

Anyway, here's the first (extra large) installment of links for your enjoyment:

Does "Getting Real" work in this economy?

...notice I’m calling people users now. That’s what people become when they don’t pay for your product—they are users, not customers. That changes the entire dynamic.

Advice for indies

It’s easy to talk big about your big app. But you have to actually build it. You have to work every day. You have to sit in the chair and stay seated. And sleep and come back to the chair.

Interview with Tobias Lutke: CEO of jadedPixel

Awesome interview: "...build something you need yourself. Above all things this is what made Shopify a success." Note: I just noticed that this isn't a direct link, and I can't figure out how to get one. So, you'll have to click the blog link on the right, and then find the article in question. Sorry about that!

Passenger and Shopify

I cannot see any reason to choose a different deployment strategy at this point. Its simple, complete, fast and well documented.

The Rails Myths

I thought it would be about time to set the record straight on a number of unfounded fears, uncertainties, and doubts. I'll be going through these myths one at the time and showing you exactly why they're just not true.

acts_as_git

A simple plugin which stores all changes you make to a text field in a git repository. This is ideal for something like a git-backed wiki.

On the Mongrel caused 400 restarts/day problem

That was the point of telling people Rails crashed that much back then. It is of course better now, but only because of the hard work of people like mentalguy and myself. Everyone else just denied there was a problem, including DHH.

50 Strange Buildings of the World

Awesome collection.

Is RSS dead to you too?

Jason at 37signals says: I haven’t used an RSS reader for a year and I haven’t looked back.

Gmail Video

Gmail voice and video chat will be rolled out globally over the next day or so for Macs and PCs.

Three Under-Used Apple Keyboard Shortcuts

Invert Your Screen: Control - Option - Command - 8. Zoom In: Control and scroll with your mouse wheel. Cropped Screen Grab to Clipboard: Shift - Control - Command - 4.

Memory management with free

Slicehost Articles: Basic monitoring should be done on a regular basis. This keeps you informed as to the general condition of your server and may warn of impending problems.

Morph AppSpace

El Dorado is one of the featured open-source apps available with a one-click install.

paperclip's automatic database creation tasks

ActiveRecord::Base.connection.create_table :dummies, :force => true

Rubyconf Slides from Heroku

Lightweight Webservices with Sinatra and RestClient

Clearing up inaccuracies about the Google OpenID IDP launch

As has become increasingly clear to everybody doing usability research on OpenID (see here and here), we absolutely need to provide mechanisms for mapping human-friendly identifiers like email addresses to identities.

Merb 1.0

It’s been a little over two years since merb was a twinkle in my eye, and a pastie. Since then it has undergone many drastic transformations, working its way towards a very solid, fast foundation for people to build their homesteads on.

A critical look at the current state of Ruby testing

All this energy on creating new DSLs for testing is energy wasted. Use the standard and focus on your real problems. We’re all spinning our wheels with these new testing syntaxes.

jquery.pngFix.js

This plugin will fix the missing PNG-Transparency in Windows Internet Explorer 5.5 & 6.

How Hard Could It Be?: The Unproven Path

I abandoned seven long-held principles about business and software engineering, and nothing terrible happened. Have I been too cautious in the past? Perhaps I was willing to be a little reckless because this was just a side project for me and not my main business.

Build Anything

Where I sit, with the cranky engineers —the insane optimists — I hope we all share this optimism because, given enough time, we can build anything.

one central, overriding guideline for iPhone UI design

Figure out the absolute least you need to do to implement the idea, do just that, and then polish the hell out of the experience.

OpenID usability is not an oxymoron

Overwhelmingly criticism of OpenID has been leveraged by developers and web users alike against OpenID’s ease of use.

The 5 Commandments of Mobile Web Design

The Mobile Web is Not the Little Sister of the Traditional Web. Give People What They Want, When They Want It. Build Unique Mobile Content, or Don’t Bother Building Anything at All. Make It Useable. Don’t Forget About Design.

Awesome bundling in Merb

The solution is to rely entirely on bundled gems, and remove system gems from bundled binaries. The side-effect is that you will need to bundle gems like mongrel, rake, etc.

The Sorry State of Blogging Software

And yet, the word on the street in the Ruby community is that writing your own blog from scratch is the way to go.

Pony, The Express Way To Send Email From Ruby

Want to fire off a quick email from your Ruby script? Finding ActionMailer to be overkill, but Net::SMTP to be...um, underkill? Envious of PHP's mail(), which sends an email with a single function call?

Making money twice

A good portion of this industry is still trying to figure out how to make money for the first time (hint: charge people). But for those who’ve mastered that, I want to talk about the next step: making money twice (or three or four times).

Giving Up

Give up early, give up often. That's one of the secrets to being an effective hacker, an effective entrepreneur, or an effective anything. High-level languages make it quick to bang out early implementations of new ideas. The trick is to put a time constraint on whatever you're doing.

The New Queue at GitHub

After trying a few different solutions in the early days, we settled on Ara Howard’s Bj. Yesterday we moved to a new queue, Shopify’s delayed_job (or dj).

On the Existence of Struct::Group in Rails

Posted by Trevor in Ruby/Rails on November 15, 2008

I ran into a really strange case yesterday while working to move an app from bj to delayed_job. I won't spend much time going into the details about why we're making this switch, but suffice to say that we had a problem similar to the one GitHub describes in their blog post. The problem is that bj reloads the entire Rails stack for every request, which is terribly inefficient. Imagine if you had to restart your web browser every time you went to a new page or submitted a form. You'd be paying a "startup tax" to launch your browser with every single request. It doesn't make sense architecturally, and it absolutely kills your CPU. The delayed_job plugin operates by leaving a single Rails instance open and available for processing requests asynchronously. It's proven to be much faster in my limited testing.

In making the move to delayed_job, I checked out the readme, which suggests structuring things like so:

 
class NewsletterJob < Struct.new(:text, :emails)
  def perform
    emails.each { |e| NewsletterMailer.deliver_text_to_email(text, e) }
  end
end
 
Delayed::Job.enqueue NewsletterJob.new('lorem ipsum...', Customers.find(:all).collect(&:email))
 

The idea here is that you can use a Struct to quickly create a class with a method named perform. When you enqueue a job for later, the perform method will be called with the parameters you provided. However cool this may be, it introduces a really interesting gotcha that I ran into almost immediately.

If your app has a Group model, you won't be able to use it within your perform method.

Why is that? Because of the way that Ruby namespaces work, the etc module, and the fact that something called Struct::Group already exists in your Rails app.

Perhaps a code example will help explain how this could happen:

 
require 'etc' # in Rails, rails/railties/lib/rails/mongrel_server requires 'etc' 
 
class Group
  def foo
    puts "hello"
  end
end
 
class WTF < Struct.new(:whatever)
  def foo
   Group.new.foo
  end
end
 
Group.new.foo
WTF.new.foo
 
# OUTPUT #
# hello
# NoMethodError: undefined method ‘foo’ for #<struct Struct::Group name=nil, passwd=nil, gid=nil, mem=nil>
 

The Group.new.foo call will work as expected, but the WTF.new.foo call will fail because it's calling the foo method on Struct::Group, which (surprisingly enough) exists, and doesn't have a method named foo. It exists because Rails has required the 'etc' module. This creates a couple of Structs on your behalf, which is the source of our problem.

Luckily, there's an easy workaround. If you prefix your calls to Group with two colons, you'll get access to the Group class that you expect. In our example, the perform method in WTF would be changed like so:

 
class WTF < Struct.new(:whatever)
  def foo
   ::Group.new.foo
  end
end
 

Totally weird. I know.

Microsoft Access .MDB to PostgreSQL db

Posted by Timothy O'Connell in General on November 14, 2008

So recently, while working with an ancient, hastily written calendar application, it fell to me to pry a decade's worth of data loose from a .mdb file.

And while the original specifications for the project only called for a csv for each of the database's tables, scope began to creep as the client realized that he needed his data to be reformatted and, in some cases, recast. Due to the persnickety nature of this client horrifyingly random nature of his data (some dates were MM/DD/YY while others where a legit timestamp; some commas were escaped, others were not, etc.), I realized that I was going to have to implement an industrial strength solution: something that would scale. So rather than writing a series of one-off python scripts (using the totally kickass csv module), I decided to get the data into a Postgres database and then query this database as necessary.

First, I'll lay out (most of) my program and then I'll do a blow-by-blow, dwelling briefly on certain important or noteworthy parts.

Remember, what we're doing here is grabbing one table at a time and stuffing it into a Postgres database (and maybe doing a little string sanitizing and type casting as we go). Anything more automated might not work, given the totally messed-up nature of the data and of the .mdb format in general.

Here goes:

#!/usr/bin/env python

from cStringIO import StringIO

import csv, os, psycopg2, re, subprocess, sys, time

dbHost = "localhost"
dbUser = "toconnell"
dbPass = "XXXXXXXXXX"
dbName = "toconnell"

# This program is run from the CLI: arguments one and two are the .mdb file
# and the table in that database that we're trying to import
mdbFile = sys.argv[1]
mdbTable = sys.argv[2]

mdbDump = "/usr/bin/mdb-export"
# Dynamically name the table, depending on the day the import is run
tableName = "%s_%s" % (os.path.basename(mdbFile.replace(".","_").lower()),time.strftime("%Y_%m_%d"))

def dumpMDB(mdbFile):
    command = [mdbDump,mdbFile,mdbTable]
    p = subprocess.Popen(command,stdout=subprocess.PIPE)
    mdbData = [line for line in p.stdout.readlines()]

    # Grab the column names
    columns = mdbData.pop(0)

    # Now write a CSV to the buffer
    for line in mdbData:
        tmpFile.write(line)

    # Now use the csv module to make a list where each line is a list
    tmpFile.seek(0,0)
    importData = csv.reader(tmpFile)
    dataList = []
    dataList.extend(importData)
    tmpFile.close()

    return columns,dataList

def createTable(columns):
    # Take this string from the pretend CSV file, break it up and remake it as something like a query
    columnsList = []
    for item in columns.split(","):
        item = item.strip().lower()
        # Also do a little type casting in there (beats having to do it later)
        if item == "date":
            columnsList.append(item + " TIMESTAMP")
        else:
            columnsList.append(item + " TEXT")

    columns = ",".join(columnsList)
    print columns

    # Now connect and do table stuff
    conn = psycopg2.connect("dbname=%s user=%s host=%s password=%s" % (dbName,dbUser,dbHost,dbPass))
    cursor = conn.cursor()

    # Check for previous tables:
    cursor.execute("SELECT * FROM pg_tables WHERE tablename LIKE '%%%s%%'" % tableName)
    results = cursor.fetchone()

    if results != None:
        # Drop previous ones:
        cursor.execute("DROP TABLE %s" % tableName)
    else:
        # Create a new one:
        cursor.execute("CREATE TABLE %s (%s)" % (tableName,columns))
        conn.commit()

def populateTable(columns,query):
    conn = psycopg2.connect("dbname=%s user=%s host=%s password=%s" % (dbName,dbUser,dbHost,dbPass))
    cursor = conn.cursor()
    cursor.execute("INSERT INTO %s(%s) VALUES(%s)" % (tableName,columns,query))
    conn.commit()

def sanitize(query):
    # You can do as many of these as necessary: I've included a single one as an example
    finalList = []
    for item in query:
        item = item.replace("'","\\'")
        finalList.append(item)
    return finalList

if __name__ == "__main__":
    # First, instantiate our pretend cvs file
    tmpFile = StringIO()

    # Now get the data
    columns,dataList = dumpMDB(mdbFile)

    # Now reformat the "columns" string a little bit
    columnsList = [item.strip().lower().replace(" ","") for item in columns.split(",")]
    columns = ",".join(columnsList)

    # Now create a new table (delete the previous one)
    createTable(columns)

    # Finally, insert data, reporting progress on the CLI (better than doing
    # select count(*) every 15 seconds)
    total = len(dataList)
    n = 1
    for item in dataList:
        item = sanitize(item)
        formatQuery = "'" + "','".join(item) + "'"
        populateTable(columns,formatQuery)
        print "Inserting data %s/%s..." % (n,total)
        n += 1
    print "Done."

In my opinion, there are two noteworthy aspects of the above: the use of the csv module and the use of cStringIO to create a csv file in the buffer (instead of simply creating a file and deleting it).

In order to better explain those two aspects of the program, here's that function again, broken into more easily digested pieces and presented with an in-line, blow-by-blow commentary:

def dumpMDB(mdbFile):
    command = [mdbDump,mdbFile,mdbTable]
    p = subprocess.Popen(command,stdout=subprocess.PIPE)
    mdbData = [line for line in p.stdout.readlines()]

Nothing special here: I do some pretty standard subprocess syntax that executes mdb-export (a part of the mdbtools package on Debian) on my file and dumps a table in csv format, one line at a time, into a list. Normally the program would just dump them to stdout: I'm just grabbing them up with subprocess's Popen function.

    # Grab the column names
    columns = mdbData.pop(0)

Now I do a list.pop(0) on this list to get the first line (i.e. the column names) of what I just dumped with mdb-export; using the built-in pop function without an integer gets you the last item in your list (in 2.5, at least).

    # Now write a CSV to the buffer
    for line in mdbData:
        tmpFile.write(line)

    # Now use the csv module to make a list where each line is a list
    tmpFile.seek(0,0)
    importData = csv.reader(tmpFile)
    dataList = []
    dataList.extend(importData)
    tmpFile.close()

    return columns,dataList

This isn't fancy code. I'm essentially creating a CSV file and reading each of its lines into a list. I do this instead of simply using the list I generated with the subprocess call because I'm nervous about splitting the strings that compose that list: as I mentioned above, there is a very real possibility that a given string will contain unescaped commas and other inappropriate characters within it and I don't even want to have to contemplate how to split those strings correctly.

What I want to do is trust the csv module to figure that out for me.

So, in order to make that I happen, I need to give the csv module a CSV file it can read. Rather than writing a file to the filesystem (which no self-respecting sysadmin will do if he can help it), I need to make some cStringIO magic happen. And in order to make that buffer magic happen, I need to do three things:

  1. import cStringIO,
  2. instantiate the file-like object and
  3. do that weird seek to it.

The import (

from cStringIO import StringIO

) is fairly simple; if you've ever used cStringIO for anything, you've done this. The instantiation (

tmpFile = StringIO()

) is also straight out of the documentation.

But once I write all of the lines of my list to the cStringIO buffer, I've got to do

tmpFile.seek(0,0)

in order to then read from that file-like object with python's built-in csv module. There are a number of obscure/arcane reasons for this, but, as I understand it, the seek(0,0) is necessary because, without it, your file-like object is merely a collection of strings and not file-like in the way that the csv module expects it to be file-like: once you do the seek, you're ready to "read" your file-like object with the csv module.

From there, it's a simple matter of using the reader() function of the csv module to get your CSV data from your cStringIO object, manipulating it however you see fit and then looping over it and inserting it in your database.

And then never, ever having to work with (i.e. around) MS Access again.

RSS is not dead

Posted by Trevor in General on November 12, 2008

I'm sick to death of people saying that RSS is dead, and I'd like to make a public service announcement about this anti-syndication meme:

RSS isn't dead, and isn't to blame for making you a news junkie. It's an efficient way to gather information from multiple sources. You should use RSS.

RSS isn't dead or dying. It's a syndication format that makes the retrieval of information from multiple sources more efficient. While I do understand the tendency for people to overdose RSS and fall victim to information overload, this is an easily remedied problem.

Problem: I'm suffering from information overload because I subscribe to too many RSS feeds.
Solution: Subscribe to fewer RSS feeds.

And yet I've been seeing things like this in my RSS reader (of all places!) written by people I have a lot of respect for:

Jason Fried:

Is RSS dead to you too? I haven’t used an RSS reader for a year and I haven’t looked back.

I just go to site [sic] that I like. I’ve found it more satisfying and it slows me down. I’m less news/information junky now which is a good thing.

There's simply no good reason to throw out the baby with the bathwater on this one. Eschewing an incredibly efficient way to gather information from multiple sources because it's too efficient at collecting information from multiple sources makes no sense.

If you find that you're spending too much time looking through your RSS reader, why not try spending less time looking through your RSS reader? It's easy. You either (a) don't open your RSS reader as often, or (b) reduce the number of RSS feeds you subscribe to.

I've also seen some claims that "all of the important stuff will make it's way to you" no matter what. The idea is that shutting off your RSS reader will save you a bunch of time, which you could be using to accomplish more important things.

Chris Wanstrath:

I don't know how many of you read RSS, but I challenge you (that's a keynote term) to give it up for a month. Just turn it off. Stop using Google Reader or NetNewsWire or whatever the kids are using these days. It's not worth your time.

What should you do instead? If you use Twitter, try following the authors of your favorite blogs. Read their tweets on the bus. Or in the bathroom. Check Ruby Inside once a week and skim over the posts. Visit an aggregator like planetrubyonrails.com once a month. But mainly, let other people do the filtering for you. Use your time for other things.

You will not miss out on anything big. Stuff like the Google App Engine, or Rubinius running Rails, or the killer speaker line up at this year's Ruby Hoedown will find its way to you. How can it not? I'm willing to bet a lot of the stuff in your RSS reader is stuff you already knew, or heard about somewhere else.

I understand that reading feeds can be a time-sink, but the suggested alternative here is to cut out feeds and use Twitter instead. Seriously? Not only is Twitter more of a time-sink than RSS feeds, but isn't Twitter just a micro-blogging service and proprietary RSS reader shoved together into a single package?

I don't understand why this has to be all or nothing. RSS is useful. Why not use it?

Windows XP Unattended

Posted by Timothy O'Connell in Code on November 05, 2008

We're finally (and reluctantly) ditching Windows 2000 for Windows XP Pro at the office. It has fallen upon me to put together a custom unattended (http://unattended.sourceforge.net/) solution.

For those who aren't familiar with the concept of an unattended installation, here's the gist:

  1. You're in an environment with mixed hardware that requires all users to have a nearly identical workstation environment.
  2. If you had dozens of identical machines, you'd simply get all your software together, install it on a master machine, configure all of your settings, create an image based on that machine, burn an acronis (http://www.acronis.com/)boot disk or two and start cranking out clones; as this is not an option due to your mixed machinery, you've got to actually install all that software and set all those settings.
  3. To do this, you use a project like unattended, which automates the installation of the OS, all additional software and allows you to run scripts as necessary during the installation of the OS and the additional softs.

So far, so good.

Problems start to arise when you realize that the unattended project's greatest strength is its greatest weakness: it is extremely flexible and customizable because it does everything in the most "vanilla" manner possible and leaves everything beyond the basics of installing your Microsoft OS and ActiveState perl up to you.

How to actually configure an unattended solution is not, however, what I'm here to discuss. What I've got today are some tips and tricks for automating tasks on Windows XP that I've picked up as I work on my own unattended solution. They concern changing the login style, stopping and disabling services and removing Windows Components.


1.) Automatically disabling the "Welcome Screen"

This, for better or for worse, is a simple registry hack. Create a file called, let's say, winXPfixLogon.reg that looks like this:

Windows Registry Editor Version 5.00

[HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Winlogon]
"LogonType"=dword:00000000

That "LogonType" dword may not be present by default; adding (or modifying) it will disable the "Welcome Screen" and allow you to use the "classic" logon (i.e. log onto the system with some dignity).

In order to automate this, simply stick a line in a batch file that looks like this:

regedit /s z:\scripts\winXPfixLogon.reg"

and you're ready to roll.


2.) Stopping and/or disabling services

There are two Windows command line tools that you can use to quickly stop and disable a service. "wscsvc" is the name of the hyper-annoying "Security Center" (a.k.a. the red-shield of nagging). To stop it and disable it (i.e. prevent it from starting automatically), all you've got to do is holler thus:

net stop wscsvc
sc config wscsvc start= disabled

Pay special attention to the spacing on that one: the single space after the "start=" is apparently make-or-break.


3.) Automatically removing Windows Components

There are few things as genuinely shitty as the cruft and useless/broken software with which Microsoft has chosen to clog their UI. Much of this bloat cannot be completely removed; Internet Explorer, Outlook Express and Windows Media Player are not going anywhere (unless you plan to avail yourself of the substantial and psychotic registry and filesystem hacks that generally won't pass muster at the office).

Those annoying components can, however, be "disabled" or made invisible to users. With a batch line like this:

SYSOCMGR.EXE /i:%windir%\inf\sysoc.inf /u:z:\install\scripts\disabled_components.txt

and a disabled_components.txt that looks like this:

[components]
freecell=off
hearts=off
minesweeper=off
pinball=off
solitaire=off
spider=off
zonegames=off
vol=off
MSNexplr=off
deskpaper=off
OEaccess=off
IEaccess=off
WMPOCM=off

you can "remove" (i.e. sort of disable) a lot of those annoying "components".

In the above excerpt, most of the names are fairly obvious: the last three are, in order, Outlook Express, Internet Explorer and Windows Media Player. Also in that list are the so-called "Internet Games" (zonegames).

Obviously this is just a light scratching of the surface: what I've laid out above is just the beginning of numerous modifications, deletions and emendations required to make Windows XP a functional operating system (apps like daemontools, cwRsync and certain others that partially un-cripple Windows all come to mind).

Feel free to share your favorite Windows XP mass-administration tricks in the comments. God knows I could use the help.

Stupid Linux Tricks: the basics of pkill and pgrep

Posted by Timothy O'Connell in General on November 04, 2008

When I say, "If you don't know the basics of pgrep and pkill, you really need to make it a priority to learn them", what I really mean is, "I just recently learned how to use pgrep and pkill myself and I'm really, really excited about them."

Whether you're automating processes on the server across the Internet or you're trying to wrangle the processes on the client right in front of you, there really is no classier or more efficient way to do it than with these two programs. What follows are two examples of how, using nothing more than some elementary bash, you can quickly retrieve PID numbers, paths to processes and basically micromanage your system into submission.

The first thing to get down with is the l flag. It'll give you the PID as well as the process name. This is useful if you've got processes that are related, but that don't have exactly the same name:

toconnell@kumiko:~$ pgrep -l thunderbird
7425 thunderbird
7433 thunderbird-bin

The f flag is kind of like the industrial strength version of this. When you throw down the f and the l together, you get a PID and an absolute path:

toconnell@kumiko:~$ pgrep -lf thunderbird
7425 /bin/sh /home/toconnell/thunderbird/thunderbird
7428 /bin/sh /home/toconnell/thunderbird/run-mozilla.sh /home/toconnell/thunderbird/thunderbird-bin
7433 /home/toconnell/thunderbird/thunderbird-bin

If you've been paying attention, you'll notice that I got three results from my second command. This is because the f greps against the full path of all running processes. This can be a real life saver when you've got (as above) processes spawning other processes or when you've got to start one process to start another. I frequently find myself doing this with .jar files.

For example, I use tn5250j (an old-timey IBM "green screen" emulator) at work. Nothing against their project, but when I end my tn5250j sessions, I've found that some of the processes that it starts tend not to die completely. I've also found that If I want to look for stranded or undead processes with a simple pgrep, I get nothing. This is because when I start the emulator, my KDE shortcut uses the following syntax:

$ /home/toconnell/jre1.6.0/bin/java -jar /home/toconnell/tn5250j/tn5250j.jar

If I pgrep for java, of course, renegade tn5250j processes will turn up. But so will all my other active java processes:

toconnell@kumiko:~$ pgrep -l java
26100 java
26131 java

If, however, I use the f, and grep against the full path, I see that these are not both tn5250j processes:

toconnell@kumiko:~$ pgrep -lf java
26100 /home/toconnell/jre1.6.0/bin/java -jar /home/toconnell/tn5250j/tn5250j.jar
26131 /usr/bin/java -jar /usr/bin/tightvnc/classes/VncViewer.jar HOST victoria PORT 5900

If I use the f and use "tn5250" as my grep term, I only get the process I want.

Which is really handy if I've got a bunch of these things running around and I want them all dead. pkill will take an f flag and neatly dispose of all of my errant java processes. As a sort of "set it and forget it" method of keeping these things from stacking up, I've created a job in /etc/cron.daily/ that kills all of the stranded processes from yesterday before I even sit down at my desk:

#!/bin/bash
set -e
`which pkill` -f tn5250 > /dev/null 2&>1
exit

But this is only the beginning of the fun. pgrep and pkill are even more invaluable in an emergency, e.g. when you've got a bunch of users (probably system users (e.g. daemons), but maybe human ones too) running around and you only want to kill the processes that one of your users has started.

Let's say that I've got two separate instances of postgres running, one belongs to the user postgres81, the other belongs to postgres83, and the new one happens to be running amok. You're desperate enough to restore order that you're ready to break Rule Number One and kill the postmaster.

The fastest way to put a bullet in your malfunctioning database processes without bothering any of your stable processes?

# pkill -f -u postgres83