11-16-2017 09:16 AM
Hello,
Has anyone every done conversion of Unicode Characters (0000 - FFFF) to UTF-8? Are there any inbuild modules which can be used for this conversion?
http://www.utf8-chartable.de/unicode-utf8-table.pl?start=128&number=1024
There is a module ASCII text to UTF-8 which does not work for unicode characters.
-VS
Solved! Go to Solution.
11-16-2017 10:31 AM - edited 11-16-2017 10:32 AM
The safe answer is "Not Yet", based on a search of LabVIEW Help and a Web Search. Part of the problem is that Unicode is U16, while ASCII (which LabVIEW basically uses) is U8. You could write your own Translator/Mapper, but it would have to deal with the Many-to-One problem (i.e. you'd need to decide which 256 Unicode characters you wanted to display as which Ascii character).
Bob Schor
P.S. -- If I'm wrong about this, I have every confidence that other readers of this Forum will point out my error ...
11-16-2017 10:44 AM
The conversion would be simple. Why don't you write a VI for it and share it.
11-16-2017 11:26 AM - edited 11-16-2017 11:30 AM
Actually ASCII is a VERY LIMITED subset of Unicode. And no, Unicode ist not UTF-16, but UTF-16 LE (little endian) is de Unicode version used on Windows. You also have UTF-8 and UTF-32 and for the 16 and 32 bit versions both an LE and BE version.
Windows NT uses internally everywhere UTF-16 LE, while translating it to 8-bit MBCS on demand for applications that don't use Unicode such as LabVIEW. And LabVIEW therefore does not really use ASCII but 8-bit MBCS. For most Windows codepages this means an extended ASCII code page with the lower 128 character codes mapped to the standard 7-bit ASCII characters and the upper 128 character points mapped to code page specific characters. But Asian and Arabian codepages can define more than 256 characters and then a single character suddenly consists of multiple bytes even in LabVIEW.
Linux uses nowadays internally mostly UTF-32 (LE or BE) depending on the endianess of the CPU but with most user systems nowadays running on x86/64 or ARM this is usually also LE.
On the user level it uses UTF-8 which is in fact also a MBCS encoding where a single character point can consist of 1 to 4 bytes. The first 128 characters in the ASCII table map exactly to the first 128 characters in the Unicode standard.
So if your LabVIEW was running on on a modern Linux system you theoretically would already use UTF-8
On Windows to get to UTF-8 from UTF-16 LE one needs to simply call the function WideCharToMultyByte() with the first parameter set to CP_UTF8 (65001) instead of CP_ACP (0).
And be very careful to allocate a large enough buffer for the returned UTF-8 string. It can theoretically get up to 4 times as long in bytes as there are UTF-16 character points in the incoming string!
11-16-2017 02:09 PM
Thanks, RolfK, for the clear and concise response!
Bob Schor
11-16-2017 03:15 PM - edited 11-16-2017 03:16 PM
Thank you for the detail description RolfK!!