Thursday, August 30, 2007

migration to UTF character set

When you try to import data from fixed size character set (ex. we8mswin1252) to UTF, you can go into problem that value is too large for column.

This problem can appear when table was created with char or varchar2 column which size was specified in bytes not in characters. There is a parameter NLS_LENGTH_SEMANTICS
which has two values - byte or char. Default value is byte, and in that case if you make a export from database with default parameter and then you import this data into new database, all tables will be created with byte size for character columns (ex. varchar2(50 byte)).
My first thought was, that if I change this parameter to "char" in new database, all tables will be created in "char" size. But this is only partial true, new tables are created with "char" length of columns but import process still is creating columns with byte.

My solution is to split import process into three steps.
1. Import only tables definitions (import with parameter rows=n)
2. Change column definition from byte to char using prepared script
3. Import data.

How to create script for column conversion ? Using SQL query in SQL*Plus:

set pagesize 0
set linesize 100
spool convert.sql
select 'alter table ' || owner || '.' || table_name || ' modify ' || column_name || ' ' || data_type || '(' || data_length || ')' from dba_tab_columns where owner='schema' and char_used = 'B' and table_name not in (select view_name from dba_views where owner='schema');
spool off

This query generate script, which is executed and after that all char and varchar2 columns should use "char" as length size.