03-21-2012 06:02 AM
I need to download the contents of web pages. I have the Internet Toolkit and use this function:
Data communication -> Protocols -> HTTP client -> GET.vi
I have problems when the page contains some unicode text; for example, the word "Čadan" is output as "ÄŒadan" by the GET vi.
I read about an undocumented "UTF-8 to Text" vi, it partially works in reconverting the garbage, for example it converts "Žizdra" to "Žizdra", but doesn't work for many other characters, like the ÄŒ above that becomes a normal C.
I also read https://decibel.ni.com/content/docs/DOC-10153 but it didn't help me...
Solved! Go to Solution.
03-22-2012 06:37 AM
Hello snamprogetti,
could you please write which operating system and version of LabVIEw are you using?
03-22-2012 06:49 AM
LabVIEW 2011 on Windows 7
03-22-2012 11:19 AM
Hello Snamprogetti,
Unicode languages can be displayed by modifying the LabVIEW configuration file. The availability of display languages depends on the version of Windows 7. Only Windows 7 Ultimate allows users to choose a display language with the steps shown in here.
Windows 7 Enterprise and Professional are not able to download some of the very common language libraries that are listed in the Microsoft Knowledge Base.
To display Unicode languages, modify the LabVIEW .ini configuration file, using the following steps:
Navigate to C:\Program Files\National Instruments\<LabVIEW>\LabVIEW.ini. AddUseUnicode=TRUE to any new line in the LabVIEW .ini file. Save the file and exit LabVIEW. Upon relaunching LabVIEW, the Unicode characters should work correctly, whether typing or pasting in.
Have you already tried that?
For control/indicator labels, it may be necessary to enter text as a Caption, not a Label. Right-click on the control/indicator and select Visible Items»Caption from the right-click menu. Paste in your text or type it using the steps from here.
You should note though that when building an executable that requires Unicode characters, you should insert the UseUnicode=TRUE line into the application.ini file generated by the executable creation.
Could you please try the above?
03-22-2012 11:41 AM
Yes i already tried that, but didn't help, probably because i am neither typing nor pasting in. Actually, for now i don't even care about displaying; all i want to do is read data from web URLs and elaborate it, but data keeps coming out that way from GET.vi
03-22-2012 12:08 PM
Everything in LabVIEW is probably working correctly in LabVIEW, its just not working the way you want (or need) it to. Can you describe what you're trying to do in a bit more detail?
The problem you're running into is that LabVIEW (for the most part, some of the unicode stuff mentioned above is the exception) interprets strings in the system code page. If the primary language used with your computer is English, Spanish, French, or any number of other "Western European" languages, the system code page is likely Windows-1252. See http://en.wikipedia.org/wiki/CP1252 for more information on exactly which characters can be represented on these systems.
What's happening is that you're receiving a (most likely) valid UTF-8 encoded string which LabVIEW interprets according to your system code page since there is no way to tell LabVIEW to interpret the string in any other way. The specific problem your having (if your using the Windows-1252 code page) is that the "Č" character does not exist in your code page, therefore, there is no way to encode it in your system code page, and no way for LabVIEW to display it. The "UTF-8 to Text" VI appears to be kind enough to replace it with a similar character than can be encoded in your system code page, namely a 'C'.
How you work around this limitation will depend on what, exactly, you want to do. If you need to display strings including characters not in your system code page, you may be out of luck. If you simply need to store strings including such characters (for example: to a file or database), you'll need to transcode the strings from UTF-8 to the proper character set for the file, database, etc.
Mark Moss
Electrical Validation Engineer
GHSP
03-23-2012 05:25 AM
Here is what i'm doing:
1) download a page containing names, for example Čadan
2) use names to build other URLs, for example http://en.wikipedia.org/wiki/Čadan
3) download pages and do some other text extraction
4) write to file
The problem is mainly in step 2, as the URL won't work at all with a ÄŒ or a C. Percent encoding, such as %C4%8C for Č, would work, but i don't know how to get it.
03-23-2012 10:52 AM
@Snamprogetti wrote:
Here is what i'm doing:
1) download a page containing names, for example Čadan
2) use names to build other URLs, for example http://en.wikipedia.org/wiki/Čadan
3) download pages and do some other text extraction
4) write to file
The problem is mainly in step 2, as the URL won't work at all with a ÄŒ or a C. Percent encoding, such as %C4%8C for Č, would work, but i don't know how to get it.
There is an Escape HTTP URL VI (http://zone.ni.com/reference/en-XX/help/371361H-01/lvcomm/escape_http_url/) that will probably handle the URL-encoding for you (i.e. convert Č to %C4%8C).
Mark Moss
Electrical Validation Engineer
GHSP
Mark Moss
03-26-2012 06:11 AM
OK! the Escape HTTP URL VI also works with UTF-8. That solved step 2.
As for step 4, i tried to convert to Unicode and append a 0xFFFE Byte Order Mark as explained in https://decibel.ni.com/content/docs/DOC-10153, but never succeeded with any conversion VI.
Instead, i did no conversion and appended a 0xEFBBBF Byte Order Mark, which should be appropriate for the UTF-8 encoding used by GET.vi's output. Indeed, the files i write now correctly show Čadan or whatever. Since i stick to UTF-8, i have to apply it (with "Text to UTF-8") if doing any ASCII string searching in step 3. This solved at least my application. Thanks for helping