«»

Technicality

· 27TH OF JANUARY, THE YEAR 2006

MIGRATING A DATABASE FROM MYSQL 5 TO POSTGRESQL 8

Occasionally you have to perform the onerous task of moving data from one RDBMS to another, and it can be a real pain if you’re a lackluster IT dilettante like myself. Different data types, different SQL syntax, different character enclosures, blah blah blah. This little post documents my recent efforts to move a large database from MySQL 5 to PostgreSQL 8.1.

Of course, there are a couple scripts out there that people have thrown together, but none of them are perfect:

mysqldump This is the handy dumping/archiving script that comes with MySQL, and you can actually give it an option like--conpatible=postgresql and it will do some handy things that will ease the move, like not including LOCK statements and enclosing uppercase table and column names in double quotes. I used it like this:
mysqldump -u -h -p \
--skip-opt \
--compatible=postgresql \
> my_dump.sql
Of course, it fails to do a lot of other useful things like data type conversions.
mysql2pgsql.pl
(standard)
This script ships with Postgres and is probably too out of date to be usable. Poor data type translation
my2pg.pl I think this also ships with Postgres. Decent data type conversion, implements enum and set types by compiling a C library. I never actually got that to work, as it didn’t seem to be generating the appropriate C, and I don’t really know how to go from compiled C to PGSQL functions anyway. I imagine it involves goats blood, though. I suspect this one might be a bit out of date, too, and it doesn’t deal with double quotes and uppercase names very well.
mysql2pgsql.perl
(GBorg)
This one’s a bout 2 years old, but does a good job on type conversions, and instead of weird C voodoo, it emulates enum fields by generating extra tables and creating constraints on the original fields. Kind of klunky, but it would make it easier to modify the set of values. Doesn’t double-quote capitalized table names, and adds extra quotes to field names that already have them.
mysql2pgsql.pl
(Zach)
Implements enum by simply checking the value against a list set in the create table. Sort of like enum but not its own data type, and not as flexible as an external table. Also, the type conversion is way out of date.

Since none of these did exactly what I wanted them to do, I decided to modify GBorg’s, which seemed like the best. My modifications were simple, but still frustrating since I don’t actually know Perl, and I am generally not on speaking terms with regular expressions. Basically, I

  • Made it double-quote uppercase table names
  • Kept it from double-quoting field names that already had quotes
  • Added a --noenum flag to opt out of making external tables for enums (my data will be static when it gets to Postgres, so constraints are unnecessary)
  • Fixed a bug wherein it removed the semicolons after table creation blocks (since this made this script completely unusable, I wonder if there isn’t a bigger better newer version somewhere…)
  • Stopped it from converting KEY to UNIQUE. At least in MySQL 5, KEY is used to create indexes, so this didn’t really make any sense. I did not add the capability to make indexes in Postgres instead. Maybe later (hahahaha).

So, using my new script, I basically did this:
mysqldump -u -h -p \
--skip-opt \
--compatible=postgresql \
> my_dump.sql

./kmysql2pgsql.pl --noenum my_dump.sql pg_dump.sql

psql < pg_dump.sql

Of course, I'm talking about a 30,000 row database, so it took a little while for all the inserts to run, but it worked. If you're having trouble, a good way to debug is to dump a schema-only version of the db with the -d flag of mysqldump. Run it through the script, eye-ball the results, maybe paste a few table definitions into psql or something.

I'll try and remember to post my changes or a diff file or something whenever I upgrade to WP2 and get file attachments going.

3 COMMENTS

Brian said on February 13th, 2006 at 4:36 am,

Hi! This is exactly the thing i need to do. Care to share your updated mysql2pg.pl code?

-Brian

joesp said on January 17th, 2007 at 7:46 am,

An update now on the status of the mysql2pgsql.perl script on gborg:
now, it’s 2007 and the script now has 2 admins. the new admin, jduarte, has added excellent capabilities to the script.

Yes, it does emulate the enum datatype, along with creation of indexes, unique values, table constraints, changes default timestamp ’0000-00-00 00:00:00′ stuff away, and is an active project, so bugs have been worked out.

[quote]Doesn’t double-quote capitalized table names, and adds extra quotes to field names that already have them.[/quote]
This has been fixed, for example, now –lowercase is an option to the script so you have the option of lowercasing all your table and column definitions or quoting them (the former is recommended)
– joesp mysql2pgsql project admin

joesp said on January 17th, 2007 at 7:53 am,

* Added a –noenum flag to opt out of making external tables for enums (my data will be static when it gets to Postgres, so constraints are unnecessary)

that might be something we could add to it

* Fixed a bug wherein it removed the semicolons after table creation blocks (since this made this script completely unusable, I wonder if there isn’t a bigger better newer version somewhere…)

this bug is already been fixed.

* Stopped it from converting KEY to UNIQUE. At least in MySQL 5, KEY is used to create indexes, so this didn’t really make any sense. I did not add the capability to make indexes in Postgres instead. Maybe later (hahahaha).

we worked again through all this logic. PRIMARY KEY datatypes in mysql now automatically create indexes. In our script when it finds the PRIMARY KEY it adds the CREATE INDEX statement to a DDL string which is run after the create table statement. That way, all the keys get indexes created. This has to be manually replicated by the script because postgres KEY types do not necessarily have an index created for them.

If you are doing a 30,000 row dump you should make a dump of mysql which dumps only the data in a tab-separated table and which is imported in a heartbeat by the COPY command.
A tutorial and a shell script which will help you is available in my blog at http://openandout.com/~jcs/index.php?cfile=bblog_index.php&postid=23
or in the FAQs at the gborg site.