11-17-2015 12:08 PM
This is a little hard to search for, but I'm looking for a way to read text out of an XPS file. I searched but found people talking about the XP operating system as plural (XPs). Does anyone have any suggestions for reading this file? The only option I thought of so far is to use a XPS to PDF converter, and then I saw a tool that would turn a PDF into text. This is of course only going to work if the XPS to PDF conversion preserves the text.
Has anyone programmaticaly read information out of a generated XPS file?
Unofficial Forum Rules and Guidelines
Get going with G! - LabVIEW Wiki.
17 Part Blog on Automotive CAN bus. - Hooovahh - LabVIEW Overlord
Solved! Go to Solution.
11-18-2015 09:35 AM
From LabVIEW, if you navigate to Help -> Find Examples and search for “Flatten and Unflatten XML,” you will see a project file (.lvproj) that has some good example VI’s outlining how to write to an XML file as well as read from and unflatten data in an XML file. This would be a great place to start!
11-18-2015 10:55 AM
Hooovahh,
Try searching for "xps microsoft". I think this is an "open" (can you believe that?) protocol that Microsoft hoped would replace PDF, which some Other Company developed. I found numerous citations with this search topic -- maybe there's even Helpful Information ...
BS
11-18-2015 11:11 AM
Could you post the xps file(s)? It would be interesting to see what you're doing. I looked up xps, and it looks like it is going to be a matter of reading the xml in to generate what is needed.
Here's a link to xps including some schemas:
https://msdn.microsoft.com/en-us/library/windows/hardware/dn614032(v=vs.85).aspx
11-18-2015 02:28 PM
Thank you, Microsoft! I just opened (on an XP VM) a Word document and printed it to an XPS file. Can you say "Unreadable"? Certainly not Ascii. Let me try with a "pure text file" from an old-fashioned Text editor ... Well, still unreadable, and slightly larger (my 93-byte text file "printed" to a 17KB .XPS file).
BS
11-19-2015 08:03 AM
Okay yeah I guess I should have posted an example. Here is one such XPS file. Please note that dispite the name this does not appear to be XML. The unflatten/flatten XML won't work, it is not ASCII.
I found several XPS to PDF converters, but in the process the text is lost. So if I try to run it through a PDF to text converter the text is garbage. For my specific application I found a possible work around. But still the discussion can continue on suggestions for parsing and understanding this file format.
Unofficial Forum Rules and Guidelines
Get going with G! - LabVIEW Wiki.
17 Part Blog on Automotive CAN bus. - Hooovahh - LabVIEW Overlord
11-19-2015 08:24 AM
@Hooovahh wrote:
Please note that dispite the name this does not appear to be XML. The unflatten/flatten XML won't work, it is not ASCII.
I've just had a look at the Wikipedia page and according to that - the xps file itself is a container for XML files:
An XPS file is a Unicoded ZIP archive using the Open Packaging Conventions, containing the files which make up the document. These include an XML markup file for each page, text, embedded fonts, raster images, 2D vector graphics, as well as the digital rights management information. The contents of an XPS file can be examined by opening it in an application which supports ZIP files.
So you might be able to unzip it to get the xml files and then extract the bits of information you need from there.
There does seem to be support for reading/writing XPS files in .NET - perhaps you might have some luck with that? https://msdn.microsoft.com/en-us/library/windows/desktop/dd316975(v=vs.85).aspx
11-19-2015 09:45 AM
Here is a link you might want to look over if you haven't already gone past this in other research:
http://www.wictorwilen.se/Post/Dissecting-XPS-part-1--The-basics.aspx
11-19-2015 09:52 AM - edited 11-19-2015 09:52 AM
Sweet thanks, sorry I missed this at first. So for those interested you can get information out by extracting the zip, then look in the Documents \1\Pages folder and you'll find a text document for each page which more or less has text that can be understood for each page. I saw the potential .Net method but didn't get very far.
Unofficial Forum Rules and Guidelines
Get going with G! - LabVIEW Wiki.
17 Part Blog on Automotive CAN bus. - Hooovahh - LabVIEW Overlord
11-19-2015 12:05 PM
@Hoovah
I looked at your sample:
In Explorer it only shows one file
After extraction it only shows one file
I'm on a Win 7 box. Does this matter?