Friday, March 21, 2008

MS SQL Server and codepage

I have a simple problem (I think): my office development machine uses Windows' regional setting with codepage 1250. My laptop is using default windows United States/English code page (latin, 1252 codepage).

SQL Server 2000 on the office machine was installed when that machine was also with the US/English regional setting. Only recently have I changed that.

With my recent project I'm getting some frustration with inserting and viewing data through enterprise manager. I've managed to get my office machine to properly insert data through osql (command line invocation from NAnt). In addition, enterprise manager properly shows all the special diacritic characters. The database is set as having collation of Croatian_CI_AS (and all the varchar columns are as well).

Now I am not 100% certain here but I believe that it should be possible to have my laptop on the US/English regional setting and whatever the codepage it uses properly insert and view the inserted data in my laptop SQL Server instance. So far, no luck on that front.

I can successfully insert the data through a script (file is encoded as UCS-2 Little Endian with help from Notepad++). However, looking at the data I can't see the special characters - they are all converted to their closest equivalent (Č becomes C). While researching a bit, I came across this really old post on MS SQL 6.0 and 4.x - SQL Server Code Pages and AutoAnsiToOEM behavior (SQL Server 6.x and earlier was, I learnt, non-unicode complaint at a time so there were troubles there apparently). A more interesting read was another article on translation of characters between code pages of server and client. The second one got me really worried since it essentially says that you're not supposed to store data from one codepage on the client in a server with a different codepage.

My trouble is that the client happily uses 1250 codepage throughout the organization and I can't use NVARCHAR to represent text but must use VARCHAR (there are existing tables to work with). I hate the idea of having to change my regional setting of the laptop just because of this one project - it appears that there ought to be a better way to do this.

Well. Right now, I'm stuck a bit. Hopefully will dig out more.

Update 23 Mar 2008:
to my great surprise I just realized that Enterprise Manager does not show diacritics in my varchar columns on the machine whose codepage is United States/English BUT when I display the data on my web page I get the proper characters shown! That was a real surprise. I cannot enter diacritics via Enterprise Manager either - they are turned into their non-diacritic siblings.

In addition (more surprises for me), SQL Query Analyzer shows the diacritics properly! This sort of suggests that I just might be able to use those differing codepages on my development machines just fine, provided that I don't try and enter data directly in Enterprise Manager. I do have to try to insert some characters via Query Analyzer to see what happens with that.

No comments: