Friday, February 26, 2010

Migrating to Postgres from Microsoft SQL Server with a little Ruby on Rails

Recently, I stopped the research phase of a business process project, and started the development phase. Out of research, I had determined using Ruby on Rails with JRuby and JDBC would be the best bet for interoperability with the current infrastructure. The current infrastructure consists of a enterprise level CRM running on SQL Server with .NET. The project would create a dataset which received CRM data. I chose SQL Server for this project to minimize complexity – one database engine.

For those interested, the following is how to connect to a SQL Server from Ruby on Rails using JRuby and JDBC. Be sure to download and drop the sqljdbc4.jar into your JRuby lib folder.


development:
host: cwinslett-475
adapter: jdbc
database: db_name_development
username: myuname
password: mypword
driver: com.microsoft.sqlserver.jdbc.SQLServerDriver
url: jdbc:sqlserver://myserveripaddress;databaseName=db_name_development


The Last Straw with SQL Server – Pagination

Take a look at the following:


SELECT * FROM (SELECT TOP 20 * FROM (SELECT TOP 20 * FROM contacts ORDER BY contacts.last_name, contacts.first_name) AS tmp1 ) AS tmp2


This is SQL Server’s solution for pagination. That is ugly, but fine. However, pagination returns an odd number of rows when performing joins. The following is the SQL:


SELECT * FROM (SELECT TOP 18 * FROM (SELECT DISTINCT TOP 18 investments.id FROM investments LEFT OUTER JOIN authorized_contacts ON (investments.[id] = authorized_contacts.[investment_id]) LEFT OUTER JOIN contacts ON (contacts.[id] = authorized_contacts.[contact_id]) WHERE (authorized_contacts.is_primary = 1) ) AS tmp1 )


The Solution – DataPumper

Also known as SQL work Bench (http://www.sql-workbench.net/) , it’s the easiest way I’ve found to migrate from SQL Server to Postgresql.

I arrived at it after trying the built in “Export” feature in SQL Server Management studio, but it failed with the Postgresql ODBC. I performed any 45 minutes of searches and finally arrived at DataPumper.

When I downloaded it, I fired up the GUI. I pointed it to my SQL Server CRM database. Worked like a champ after pointing to the JDBC files for the databases.

Integrating with Ruby on Rails

I now use DataPumper for my “rake db:seed” task. I needed production data for my environment, and when I deploy, I’ll need the same data. First step was setting up my export schema. I created a file with all my tables (db/development_import):


WbCopy -sourceProfile='My SQL Server Profile'
-sourceGroup='Default group'
-targetProfile=’My Postgres Profile’
-targetGroup='Default group'
-targetTable=addresses
-sourceTable=dbo.addresses
-columns='description/description, address/address, address_2/address_2, address_3/address_3, address_4/address_4, city/city, province/province, postal_code/postal_code, country/country, is_primary/is_primary, is_mailing/is_mailing, id/sql_server_id, contact_id/sql_server_id'
-deleteTarget=false
-continueOnError=false
;
WbCopy -sourceProfile='My SQL Server Profile'
-sourceGroup='Default group'
-targetProfile=’My Postgres Profile’
-targetGroup='Default group'
-targetTable=contacts
-sourceTable=dbo.contacts
-columns='prefix/prefix, last_name/last_name, middle_initial/middle_initial, first_name/first_name, suffix/suffix, company/company, phone_work/phone_work, phone_home/phone_home, phone_other/phone_other, email/email, email_2/email_2, email_3/email_3, authentication_string/authentication_string, id/sql_server_id, formal_salutation/formal_salutation, fax/fax'
-deleteTarget=false
-continueOnError=false
;


Then I copied the sqlworkbench.jar to db/ sqlworkbench.jar. Then I configured db/seeds.rb:


`java -cp db/sqlworkbench.jar workbench.WbStarter -script=db/development_import`


Now, when I run “rake db:seed” it will fire off the data migration. When I run “rake db:reset” I get a clean DB with fresh data.

Data Clean Up

In my data migration above, I map the Primary keys from SQL Server to another field, because I want Postgres to handle its own Primary Keys. I added this to the end of my db/seeds.rb:


ActiveRecord::Connection.execute(“UPDATE addresses
SET contact_id = contacts.id
FROM contacts WHERE contacts.slx_id = addresses.contact_slx_id;”)


Data Migration Gotchas

Data types will hurt here. Database data types are not standard. Even if databases had the same data types, application developers use different conventions. Above, in my DataPumper configuration, I use “-sourceTable.” However, the “-sourceTable” is actually a view I’d created to use with Rails. I had already done data type variable casting to get boolean types. I did this with the following in my SQL for the view, notice the inline (CASE WHEN THEN statements):


SELECT ADDRESSID AS id, ADDRESS1 AS address, ADDRESS2 AS address_2, CITY, STATE AS province, POSTALCODE AS postal_code, COUNTRY AS country, ADDRESS3 AS address_3, ADDRESS4 AS address_4, ENTITYID AS contact_id, DESCRIPTION, CASE ISPRIMARY WHEN 'T' THEN CAST(1 AS bit) ELSE CAST(0 AS bit) END AS is_primary, CASE ISMAILING WHEN 'T' THEN CAST(1 AS bit) ELSE CAST(0 AS bit) END AS is_mailing FROM dbo.ADDRESS