Thursday, March 20, 2008

osql and input file encodings

I have been trying to run MS SQL Server's osql utility (SQL Server 2000) to insert some default data. My SQL Server's target database collation is Croatian_CI_AS. After some time I've realized a few things about the ways in which you can save the input file.

Using Notepad++ as my editor I thought that formatting a file as UTF-8 would cause everything to be wonderful and magically work. Not so. In fact, osql will not even read the UTF-8 encoded file (this was a real surprise)! That was a weird thing but I kept on getting this error message about first line and some weird character displayed there. Turns out that osql accepts unicode or ANSI encoded files (and there's no way to control the input file encoding from the command line so we're stuck with that ... which is not that bad after all).

So, no.1: don't format your input file for osql as UTF-8. That's a different animal from a unicode encoded file.

Notepad++ offers several formatting options. Turns out that UCS-2 Little Endian is the wizard. For those interested, read on UCS-2 on wikipedia. I just realized it was actually a 16-bit Unicode encoding. Just looking at it in the list of encodings in Notepad++ didn't ring bells.

I'm not sure why UCS-2 Big Endian encoding does not play nice (I get gibberish for most of my special characters).

Formatting a file as ANSI in Notepad++ and then running through osql was a bit of a surprise that it did not work (special characters like diacritics became gibberish, although osql did execute everything). I even changed my Windows code page to Croatian in hope that it will all somehow sort itself out but it didn't. From what I can tell, osql will read the ANSI encoded file as probably English Latin encoding and loose the special characters in the process.

So, bottom line, UCS-2 Little Endian is our friend. Use that for encoding files that are to be executed with osql. Oh, and probably get a good editor like Notepad++.

3 comments:

Anonymous said...

Thanks. IU faced the same problem. Converting to ANSI helps.

Kyrylo.

Anonymous said...

Magic for me!!! just to find the notepad++ command line to convert

Rudi Larno said...

For those using Visual Studio (2008) you can set the encoding of the file using the
File > Advanced Save Options
Encoding: Unicode - Codepage 1200