New site

Boatdig.com is a sailboat classifieds website that we just launched. It has been designed to allow ads related to certain boat models to be grouped together, geo mapping, and optimized to allow for maximum visibility of individual ads both from a search engine perspective as well as from a site perspective.


Why Hosting Rails is ahead of the curve

My web hosting adventures have spanned several different hosts over the years. I started off with Lunarpages - I did my research and found that they were consistently touted in the reviews that I had read. At the time (this is no longer the case), with a shared hosting account on Lunarpages you could host only one domain without paying extra money. As I started wanting things like the ability to quickly and easily host additional domains, get a dedicated IP addresses and SSH access, it seemed that Lunarpages no longer met my needs. So then I switched to Godaddy and got a virtual dedicated server (VDS or VPS) with the Plesk control panel. With this, I felt like I had more control. However, as I learned more about web hosting, I started to see how the Plesk control panel was getting in the way for me. There was a limitation in how many domains I could add with my plan (if I wanted to add more, I’d have to shell out more money) and I got worried that if I upgraded certain packages on the VDS, that it would break Plesk. So I started looking into VDSes without a control panel. I had installed a linux distribution on my local machine and thought that I would be able to handle the administration of a web server from the command line. That’s when I found Slicehost through a recommendation in an IRC channel. Slicehost offers VDSes with dedicated resources (memory and CPU allocation) and you get to choose your linux distribution. Starting at just $20, this is a really good deal in my opinion. I had full control of my VDS including which packages and versions I wanted to install and I could also host as many domains as I wanted by using virtual hosting on either Apache or Lighttpd (I tried both). However, all this control came at a sharp price to me. I had to spend a lot of time learning linux commands, managing packages and figuring out optimal configurations.

I was still fairly happy with Slicehost and I wasn’t planning on switching hosts again. I had a couple of Ruby on Rails applications that I wanted to launch and I was having a really hard time with deployment. I figured that I could find an inexpensive managed server host for just my Rails apps. I did a search in Google and found Hosting Rails. I asked about them in the #rubyonrails IRC channel and heard positive things about them, so I went ahead and signed up for a plan.

It turns out that I didn’t realize all the features I was getting until I actually started using it (yes, I know that they’re all laid out on the home page). With hosting plans starting at $2.88 per month, you get all the expected programs – cPanel, php, mysql, perl etch. In addition to these, you also get SSH access, subversion and the ability to host unlimited domains – I’ve never heard of getting all this on a shared hosting plan. Some people may say that you don’t really need SSH access if you have a cPanel. Well, the truth is if you want to move files around, edit them and create symlinks, it’s a lot faster if you have SSH access.

So now, I’m spending less money on hosting that I did with a VDS, I’m able to host all my sites just as I did before and I’m spending less time doing systems administration because I don’t have to worry about maintaining and updating software packages because the web host does it for me.

If you plan on signing up for an account, please consider using my referral link.


Telecommuter Jobs

We just launched a new web application called Telecommuter Jobs. A Ruby on Rails application, it scrapes job listings from Craigslist that have been labeled as being appropriate for telecommuting. It is specific to job category but not to location, which makes sense since telecommuting jobs should be able to be done from anywhere. Listings automatically update once a day.


This must be behavioral targeting

Click for largerClick for largerI've been seeing a lot of ads for Microsoft adcenter lately. After seeing this (on a celebrity gossip blog), I've concluded that I'm being shown so many of them because of behavioral targeting. Surely, they don't think that visitors of celebrity gossip blogs have a high composition of Internet advertisers. However, it's my opinion that some frequency capping might be in order. I already have an adcenter account, so the 50-100 impressions I've seen are wasted.


Ruby rdig modification

I've recently been messing around with Jens Kraemer's fantastic web crawler module, Rdig. One thing I noticed about it, however is that it's not coded to optimally crawl a website while including or excluding certain URL patterns.

First off, the start_url has to get past the include and exclude url pattern filters or else the site won't get crawled.

Secondly, because documents go through the pattern filter before they get added to the queue, pages that can be accessed only from pages that don't get past the pattern filter won't be seen at all.

The solution to this is fairly simple. I altered the code so that documents would go through the filters except for the include and exclude filters before it gets added to the queue, and then run the documents through the include and exclude filters before it gets added to the index. So here is what I did:

Changed the order of members of the filter_chain array in rdig.rb:


    def filter_chain
      @filter_chain ||= {
        # filter chain for http crawling
        :http => [
          :scheme_filter_http,
          :fix_relative_uri,
          :normalize_uri,
          { :hostname_filter => :include_hosts },
          RDig::UrlFilters::VisitedUrlFilter,         
          { RDig::UrlFilters::UrlInclusionFilter => :include_documents },
          { RDig::UrlFilters::UrlExclusionFilter => :exclude_documents } 
        ],
        # filter chain for file system crawling
        :file => [
          :scheme_filter_file,
          { RDig::UrlFilters::PathInclusionFilter => :include_documents },
          { RDig::UrlFilters::PathExclusionFilter => :exclude_documents }
        ]
      }
         
    end

Replaced the definition of the apply (line 55) method in url_filters.rb with the following:


      def apply_first(document) # applies 0-4 of @filters array
        @filters[0..4].each { |filter|
          return nil unless filter.call(document)
        }
        return document
      end
      
      def apply_second(document) # applies 5-6 of @filters array
        @filters[5..6].each { |filter|
          return nil unless filter.call(document)
        }
        return document
      end

In the add_url method definition in crawler.rb, I changed apply method call to the following:


      doc = filterchain.apply_first(doc)

Changed the process_document method definition in crawler.rb to the following:


    def process_document(doc, filterchain)
      doc.fetch
      # add links from this document to the queue
      doc.content[:links].each { |url| 
        add_url(url, filterchain, doc) 
      } unless doc.content[:links].nil?

      return unless @etag_filter.apply(doc)
      doc = filterchain.apply_second(doc)
      if doc
        @indexer << doc if doc.needs_indexing?
      end
    rescue
      puts "error processing document #{doc.uri.to_s}: #{$!}"
      puts "Trace: #{$!.backtrace.join("\n")}" if RDig::config.verbose
    end

Also, if you're having problems appending to an existing index, make sure that line 103 of config.rb starts with 'cfg' and not 'config'.


Installing lighttpd on CentOS 4

updated Oct 23, 2007

I recently decided to switch from Apache to lighttpd because Apache along with mod_php was using up a lot of memory and Lighttpd apparently has a lighter memory footprint. The difference is significant - with Apache, I was using up nearly all of my 256MB whereas with Lighttpd, I'm using only 132MB and I think with some tuning, I'll be able to get it lower.

In order to install lighttpd with yum, you need to include the RPMForge repository. Instructions on how to do this are here.

For some reason the RPMForge repository only showed version 1.3.16 for me, so I used RPMs, which can be grabbed here. Make sure that you grab the one that matches your distro and system architecture. If you use php, you'll want to install the fastcgi rpm as well. To install, I did:


$ wget http://dag.wieers.com/rpm/packages/lighttpd/lighttpd-1.4.18-1.el4.rf.x86_64.rpm
$ wget http://dag.wieers.com/rpm/packages/lighttpd/lighttpd-fastcgi-1.4.18-1.el4.rf.x86_64.rpm
$ rpm -Uvh lighttpd-1.4.18-1.el4.rf.x86_64.rpm
$ rpm -Uvh lighttpd-fastcgi-1.4.18-1.el4.rf.x86_64.rpm

You should then be able to start up the daemon with:


$ service lighttpd start

The configuration file is /etc/lighttpd/lighttpd.conf. In it, you should set your web root with the server.document-root directive.

Getting lighttpd to serve PHP files

Make sure that you installed lighttpd-fastcgi above. Make sure that the following line exists in /etc/php.ini though it should have the right setting by default:

cgi.fix-pathinfo = 1

Make sure mod_fastcgi is uncommented (and thus loaded) in server.modules in your lighttpd.conf file:

Server.modules = (
	mod_fastcgi,
			)

Then make sure the fast.server section is uncommented and looks like as follows:

fastcgi.server = ( ".php" => (( 
                     "bin-path" => "/path/to/php-cgi",
                     "socket" => "/tmp/php.socket"
                 )))

My bin-path was /usr/bin/php-cgi. If you're unsure, you can issue the following command:

$ whereis php-cgi

Then restart the lighttpd daemon with:

$ service lighttpd restart

And you should be able to serve php files.

Virtual Hosts

Virtual hosting can be done in a few different ways: using conditionals, simple-vhosts or a combination of conditionals and simple-vhosts. As I understand it, simple-vhosts are used for where all your virtual hosts (domains, subdomains or sites) have the same structure within the web root and conditionals are used for exceptions.

Since I have a relatively small number of sites, I decided to go the route of using conditionals. For documentation on this, take a look at this wiki article.

In lighttpd.conf, I used the following as an example:

$HTTP[host] == domain.com {
server.document-root = /path/to/document/root/for/domain.com
}
$HTTP[host] == domain2.com {
server.document-root = /path/to/document/root/for/domain2.com
}

Using a combination of conditionals and simple-vhosts can save you some typing. For example:

$HTTP["host"] == "subdomain1.example.org" {
 #conditional
}
else $HTTP["host"] == "subdomain2.example.org" {
 #conditional
}
else $HTTP["host"] =~ "^." {
 # simple vhost stuff here
}

Redirecting domain.com to www.domain.com

If you'd like to redirect requests made to domain.com to www.domain.com you can't do it using an .htaccess file because lighttpd doesn't support them. You can, however enable mod_redirect in lighttpd.conf (uncomment it at the top of the file) and use something like the following:


$HTTP["host"] !~ "^(www|mail|webmail|208)" {
  $HTTP["host"] =~ "^(.*)" {
    url.redirect = ("^/(.*)" => "http://www.%1/$1")
  }
}

What that does is redirects any http requests that do not start with www, mail, webmail or 208 to the same url except beginning with "http://www." I have "208" in there as well because if I type in my server IP address, I don't want it to be redirected.

You can also include this within a virtual host so that this conditional affects only certain domains.

Drupal Clean URLs

Using mod_rewrite seemed to be the easiest way to enable clean URLs in drupal. You will want to enable mod_rewrite (uncomment it out at the top of your lighttpd.conf) and then use the following code:


url.rewrite-final = (
  "/rss.xml$" => "/index.php?q=rss.xml",
  "^/([^.?]*)\?(.*)$" => "/index.php?q=$1&$2",
  "^/search/(.*)$" => "/index.php?q=search/$1",
  "^/([^.?]*)$" => "/index.php?q=$1",
  "^/([^.?]*\.html)$" => "/index.php?q=$1",
  "^/([^.?]*\.htm)$" => "/index.php?q=$1"
)

I have multiple sites on my server and only a portion of them are drupal sites, so I put those rewrite rules into a vhost:


$HTTP["host"] == "www.mydomain.com" {
  server.document-root = "/var/www/html/drupal"
  url.rewrite-final = (
    "/rss.xml$" => "/index.php?q=rss.xml",
    "^/([^.?]*)\?(.*)$" => "/index.php?q=$1&$2",
    "^/search/(.*)$" => "/index.php?q=search/$1",
    "^/([^.?]*)$" => "/index.php?q=$1",
    "^/([^.?]*\.html)$" => "/index.php?q=$1",
    "^/([^.?]*\.htm)$" => "/index.php?q=$1"
  )
}

You may be puzzled by why the server.document-root is set to something that looks rather generic, but that is because I use drupal's multisite feature, so all my drupal sites use that same document root.

Wordpress permalinks

Wordpress permalinks can be enabled in a very similar way as with drupal. Mod_rewrite should be enabled, and then the following code used:


url.rewrite-final = (
  "^/(wp-.+).*/?" => "$0",
  "^/(sitemap.xml)" => "$0",
  "^/(xmlrpc.php)" => "$0",
  "^/(.+)/?$" => "/index.php/$1"
)

And for a specific virtual host:


$HTTP["host"] == "www.wordpress_site.com" {
  server.document-root = "/var/www/html/domains/wordpress_site"
  url.rewrite-final = (
    "^/(wp-.+).*/?" => "$0",
    "^/(sitemap.xml)" => "$0",
    "^/(xmlrpc.php)" => "$0",
    "^/(.+)/?$" => "/index.php/$1"
  )
}

Tutorial: Managing a Godaddy virtual dedicated server (VDS or VPS or virtual private server)

Note: I no longer use Godaddy's VDS. Please see this post to read about why I switched and which host I use now.

I had purchased Godaddy's virtual dedicated server (VDS) web hosting service because a systems administrator friend of mine had told me the benefits of having 'root' access and it made sense. I also wanted to host a moderate number of sites, so I wanted to upgrade from the shared hosting that I was using at the time. However, I wasn't prepared for all the server management that would be required for some of the things that I wanted to do. Since Godaddy's VDSes are sold as largely unsupported, there was a steep learning curve for me. As such, I'm putting down some useful things I've learned along the way, which may be helpful for people who aren't experienced webmasters.

I'm constantly learning new things about web hosting, so as I come across them, I will continue to add useful tips to this page.

Add-ons

I got a couple of add-ons with my VDS and I'm happy that I have them. The first one is Plesk. This is a graphical control panel for the server that makes it easy to administer sites. The alternative is using SSH commands for everything, which just doesn't appeal to me. The second add-on I got is an ftp backup server. This is a separate server with a separate IP address. You can use the backup functionality in plesk to schedule regular backups to your site. I'm pretty paranoid about backing up data especially after hearing about a very large site crashing and losing all its data, so this add-on was a no-brainer especially at the $2 or $3 a month that it costs for 10GB.

Using FTP

I always use the server IP address when logging in. You can probably use a domain name when accessing certain domains, but I figure that if you can use one hostname whenever you're using an FTP client, it's much easier that way.

Logging on using SSH

I've never used SSH before I got my Godaddy VDS, but logging on is fairly simple. You need to download a program called PuTTY. To log on using SSH, simply open PuTTY, enter in your VDS ip address, select SSH as the protocol, open the connection and enter in your username and password when prompted. The username and password should be the administrator username and password for your VDS that you decided on when you set up the VDS. Note that when using SSH, you can potentially do a lot of damage to your server, so it's a good idea to not use any commands unless you know what it does. If you need to be logged in as 'root' to do something, you should use the command 'su' to temporarily make you the root user.

FTP backup

Plesk has a backup functionality which will back up your site(s) on demand or on a scheduled interval. You can back up a site and place that backup file that plesk generates either on the local server or on a remote ftp server. I chose to have all my sites backed up on a regular interval to my remote ftp server, although I occasionally have plesk generate a backup, which I then download to my local machine so that the site is backed up twice.

The backing up part is pretty intuitive and all in plesk. You go to the domains > domain.com > backup. You enter in the ftp backup information into FTP Account Properties. If you want the files stored in a directory other than the base directory, you have to create the folder using SSH. You would use Backup Now for an on-demand backup and Scheduled Backup to have your site backed up on a periodic basis.

What wasn't so intuitive to me was how to view the backed up files (for verification) and transferring them to my local machine so that I could test that they work. To do this, log in to your server using SSH. You don't need to login as 'root.' Then type in:


ftp [ip address]

where [ip address] is the IP address that Godaddy gave you for the ftp backup server. At one point, Godaddy told me to use the 'sftp' command, but that didn't work for me. You will then be prompted for your username and password, which you can find in your VDS manager.

Once you're logged into the ftp backup server, you'll be at the base directory, which is where backup files will get saved if you don't specify a folder. Use the command 'ls' to view all the files in the base directory. If you want have the backup files placed in certain folders, you can create folders at this time by using the command:


mkdir [folder name]

I use my site names for the folder names so that domain123.com gets backed up in the folder 'domain123.'

Now that I was able to view the files, the next step for me was to test them to make sure that they worked. In order to do this, I wanted to transfer a backup file from the ftp backup server onto my local machine. To do this entails a two-step process. First, I would send a back up file to the VDS with the following command:


get [filename]

where [filename] looks something like domain123.com_2006.10.30_14.45
Next, I would transfer the file from the VDS to my local machine using an ftp client such as filezilla. When logging into the VDS using an ftp client, you should use the administrator's username and password that you set when you first purchased the VDS. You should also make sure that the servertype is set to 'SFTP' to ensure security. Once you're logged in, you should be at the base directory for that username, so something like /home/user123. In that folder, you should also see the files that you sent there using the 'get' command. You can now drag and transfer those files to your local machine.

In order to simulate what would happen if I lost site data, I deleted one of my sites from plesk (before doing this, I backed up the files manually). This was also a site that wasn't very important to me, so I didn't care if it was down. Then I added the domain into plesk as if I were adding a new site. Then I went to the backup functionality in plesk, and clicked on Add New File. I uploaded the back up file, and once uploaded, I clicked on it and hit Restore. This restored my site back to how it was originally and gave me reassurance that the backup file would work in a time of need.

Note: I'm sure that there's a way to send a backed up file directly from the backup server to a folder on the VDS so that the backup functionality in plesk recognizes it so you don't have to upload the file from your local machine, but I don't know which folder that would be, and it's not a top priority for me right now.

Downloading configuration files to your local machine

There are times when you may want to download a file to your local machine, but you can't since it belongs to 'root'. I don't quite understand this part, but I believe that since I log in as user123, a file has to belong to user123 for me to download it to my local machine. Otherwise, I get a permission denied message. In order to change the owner, you need to be logged in as 'root'. Then use the 'chown' command like such:


chown [filename] [user you want ownership to be transferred to]

An example of a file that I might want to download and edit is the php.ini file, which sets the configuration of php on your server. I believe that there's a way to edit files directly on the server, but I still need to learn how to do this.

Secure FTP

I've been using standard FTP for a while now, but recently I learned about the merit of using secure FTP or SFTP. When transferring files using standard FTP, the data (including login information) gets transmitted in plain text, so someone can conceivably intercept it and read it. If you use SFTP, the data gets encrypted so even if it's intercepted when you're sending it to your server, it won't be readable. Getting started with this is pretty easy. When you're logged into Plesk, navigate to the domain administration for the domain you'd like to do this for (this should really be done for every domain you use). Then go to 'Setup.' Under Preferences, you will see a dropdown menu for Shell access. For this, I just selected /bin/sh. Click OK and you're done on the server end. Next, for whatever FTP client you use, for servertype, select SFTP. Your FTP client should now be able to connect to the server as SFTP and transmit data securely. There is also a way to setup SFTP without giving away Shell access. Read about it at TheOneAndTheOnly.com.


Links for 8-7-06

  • WOMMA Research blog - a new blog by the Word of Mouth Marketing Association. Lots of great research and stats!
  • Share Your Secret - a new consumer generated content site by Secret, allowing women (or men) to post and share their inner most secrets

Odani Interactive is an Internet services company located in New York City. Our core competencies are online marketing (in particular direct response advertising) and web development.