VB6: I Can’t Figure Out Why This Code Works

VB6/A uses implicit two-way UTF16-ASCII translation when reading / writing files using built-in operators.

Line Input treats the file as being in ASCII (a series of bytes, each represents a character), using the current system codepage for non-Unicode programs. The read characters are converted to UTF-16.

When you read a UTF-8 file in this way, what you get is an “invalid” string – you can’t use it directly in the language (if you try you will see garbage), but it contains usable binary data.

Then the pointer to that usable binary data is passed to WideCharToMultiByte (in UnicodeToAnsi), which results in another “invalid” string being created – this time it contains “ASCII” data. Effectively this reverts the conversion VB does automatically with Line Input, and because the original file was in UTF-8, you now have an “invalid” string with UTF-8 data in it, although the conversion function thought it was converting to ASCII.

The pointer to that second invalid string is passed to MultiByteToWideChar (in AnsiToUnicode) that finally creates a valid string that can be used in VB.

The confusing part about this code is that strings are used to contain the “invalid” data. Logically all these should have been arrays of bytes. I would refactor the code to read bytes from the file in the binary mode and pass the array to MultiByteToWideChar directly.

Leave a Comment