Index arbitrary line using read from text file

Chris2051 · ‎10-19-2006

I am attempting to develop a VI that can be used to read an arbitrary line from a very large text file without having to read in the entire file. The read from text file has a "read lines" option in the shortcut menu which allows one to specify how many lines to read but not where to starting rading the lines. The set file position only allows positioning the marker in terms of bytes, I would like to set it in term of lines. Does anyone have a work around?

paulmw · ‎10-19-2006

I don't think you can jump to specific lines without reading the whole file. The determination of a line is indicated by an end of line character (or characters). Unless all lines have the same number of characters you won't be able to know where or how many lines there are in text file. You would have to start from the begining and scan your way into the file until you reach X number of EOL characters. "Read Lines form file.vi" will do this for you.

Chris2051 · ‎10-19-2006

Yeah I was afraid of that but I wanted to run the idea by others to make sure I wasnt missing something. Thanks for the reply.

Jarrod_S. · ‎10-19-2006

Alternative to using Read Lines from File.vi, you could use the Scan From File to search for end of line characters. I'm not sure if it's much better than Read Lines, but here's an example if it helps:

Jarrod S.
National Instruments

Chris2051 · ‎10-23-2006

Thats an interesting implementation, thanks for the reply! Ill give it a whirl and see how much faster it runs in comparison to reading in the entire text file.

joshmont · ‎04-29-2011

I'm resurrecting this thread, as I can't imagine someone hasn't come up with an efficient solution to this problem. Reading arbitrary lines from a large data file seems like it would be a fairly common technique!

Jarrod's VI does allow you to update the refnum, but -as you'd expect- the time required to loop through each line grows with file size. As an example, I'm trying to read in a file that will almost always have at least 2M lines, but the data in consecutive lines is very well defined (e.g. 0, 1, 2, 3, etc). With this knowledge of starting value and increment, has anyone come up with a way to efficiently go to an arbitrary point (say, line 1,495,290)?

I know there are some real pros out there and I bet this would benefit a lot of folks.

nathand · ‎04-29-2011

@joshmont wrote:

Jarrod's VI does allow you to update the refnum, but -as you'd expect- the time required to loop through each line grows with file size. As an example, I'm trying to read in a file that will almost always have at least 2M lines, but the data in consecutive lines is very well defined (e.g. 0, 1, 2, 3, etc). With this knowledge of starting value and increment, has anyone come up with a way to efficiently go to an arbitrary point (say, line 1,495,290)?

I know there are some real pros out there and I bet this would benefit a lot of folks.

Can you be a bit more specific about how your lines are formatted? If you know that every line has exactly the same number of characters, you can use "Set File Position" (File I/O -> Advanced File Operations) to jump to a specific location in the file. However, this only works if each line has the same number of characters. It's not enough for them to have similar formatting (for example, you might have a file in which each line has 5 numbers separated by tabs, but if some of those numbers have 3 digits and others have 4, then you have different numbers of characters). This is one advantage of binary files - any given datatype will always contain the same number of bytes regardless of the actual value it contains.

joshmont · ‎04-29-2011

@nathand wrote:

Can you be a bit more specific about how your lines are formatted? If you know that every line has exactly the same number of characters, you can use "Set File Position" (File I/O -> Advanced File Operations) to jump to a specific location in the file. However, this only works if each line has the same number of characters. It's not enough for them to have similar formatting (for example, you might have a file in which each line has 5 numbers separated by tabs, but if some of those numbers have 3 digits and others have 4, then you have different numbers of characters). This is one advantage of binary files - any given datatype will always contain the same number of bytes regardless of the actual value it contains.

Right, I forgot to mention that the reason it's been a problem is that the number of characters per line is not consistent. I've included a sample from a data file; imagine that it continues to at least 2x10^6. I'm not familiar with binary files. Perhaps this is something I should read more about?

nathand · ‎04-29-2011

joshmont wrote:
Right, I forgot to mention that the reason it's been a problem is that the number of characters per line is not consistent. I've included a sample from a data file; imagine that it continues to at least 2x10^6. I'm not familiar with binary files. Perhaps this is something I should read more about?

Depends on what you're doing with the files, but if you don't need them to be human-readable, binary is probably the way to go. You don't run the risk of losing any precision due to formatting (the bytes are written to the file exactly as the computer stores them) and you don't have to do any conversion from strings to numbers, saving time and memory. For a small file that's never an issue, but once you're talking about millions of lines it can make a difference. I have no experience with TDMS but you might look into that as well (it's also a binary format, but with some additional information to help with formatting and organizing the data).

earonesty · ‎06-16-2011

You can get "close" to a line jump.

Create a sampling distribution, (say 10K at 100 points). Count the lines in each sample. Either

a) Fit those points to a model. Then use the model and interpolate the location you want to jump to. or

b) create a simple ratio (lines per byte), and use that.

Similar to the algorithim I used when building "alc" approximate line counting tool ... which is a linear model only.

Or you can index the file...storing line numbers and byte offsets in a separate file

LabVIEW

Index arbitrary line using read from text file

Index arbitrary line using read from text file

Re: Index arbitrary line using read from text file

Re: Index arbitrary line using read from text file

Re: Index arbitrary line using read from text file

Re: Index arbitrary line using read from text file

Re: Index arbitrary line using read from text file

Re: Index arbitrary line using read from text file

Re: Index arbitrary line using read from text file

Re: Index arbitrary line using read from text file

Re: Index arbitrary line using read from text file