MATLAB: how to display UTF-8-encoded text read from file?

I present below my findings after doing some digging… Consider these test files:

a.txt

ΓΔΘΛΞΠΣΦΩαβγδεζηθικλμνξπρςστυφχψω

b.txt

தமிழ்

First, we read files:

%# open file in binary mode, and read a list of bytes
fid = fopen('a.txt', 'rb');
b = fread(fid, '*uint8')';             %'# read bytes
fclose(fid);

%# decode as unicode string
str = native2unicode(b,'UTF-8');

If you try to print the string, you get a bunch of nonsense:

>> str
str =

Nonetheless, str does hold the correct string. We can check the Unicode code of each character, which are as you can see outside the ASCII range (last two are the non-printable CR-LF line endings):

>> double(str)
ans =
  Columns 1 through 13
   915   916   920   923   926   928   931   934   937   945   946   947   948
  Columns 14 through 26
   949   950   951   952   953   954   955   956   957   958   960   961   962
  Columns 27 through 35
   963   964   965   966   967   968   969    13    10

Unfortunately, MATLAB seems unable to display this Unicode string in a GUI on its own. For example, all these fail:

figure
text(0.1, 0.5, str, 'FontName','Arial Unicode MS')
title(str)
xlabel(str)

One trick I found is to use the embedded Java capability:

%# Java Swing
label = javax.swing.JLabel();
label.setFont( java.awt.Font('Arial Unicode MS',java.awt.Font.PLAIN, 30) );
label.setText(str);
f = javax.swing.JFrame('frame');
f.getContentPane().add(label);
f.pack();
f.setVisible(true);

enter image description here


As I was preparing to write the above, I found an alternative solution. We can use the DefaultCharacterSet undocumented feature and set the charset to UTF-8 (on my machine, it is ISO-8859-1 by default):

feature('DefaultCharacterSet','UTF-8');

Now with a proper font (you can change the font used in the Command Window from Preferences > Font), we can print the string in the prompt (note that DISP is still incapable of printing Unicode):

>> str
str =
ΓΔΘΛΞΠΣΦΩαβγδεζηθικλμνξπρςστυφχψω

>> disp(str)
ΓΔΘΛΞΠΣΦΩαβγδεζηθικλμνξπÏςστυφχψω

And to display it in a GUI, UICONTROL should work (under the hood, I think it is really a Java Swing component):

uicontrol('Style','text', 'String',str, ...
    'Units','normalized', 'Position',[0 0 1 1], ...
    'FontName','Arial Unicode MS', 'FontSize',30)

enter image description here

Unfortunately, TEXT, TITLE, XLABEL, etc.. are still showing garbage:

enter image description here


As a side note: It is difficult to work with m-file sources containing Unicode characters in the MATLAB editor. I was using Notepad++, with files encoded as UTF-8 without BOM.

Leave a Comment