line extract and split from files

Srm27 · ‎12-17-2010

Hi,

Once more a huge thanks for you time on this.

Looking at the initial results I hope I can scale this up and add in the testing of the data required, saving the results to file as part of a test report.

Yes we do get some fails along the line with data getting corrupted.

Yes each item (600 in this file) needs to be compared as, for example, in line 1 element 4, compared to line 2 element 4 in my data set there should be a difference of 601 while further into the line we have elements with a difference of 64935.

Once more thanks for the help, I will work through the code to try and understand it and scale it to the needs of this data set. I hope then to be able to modify it to work on other data sets.

Thanks for the help

Simon

Learning and struggling with C!!!

Srm27 · ‎12-20-2010

Hi,

I did the modifications to the given code to get the results I would expect from live files, which worked well.

One thing that did strike me was the time to perform the whole operation, it was about 10 minuets.

With the lessons learned from the help received I simplified the whole operation to check all lines for a ‘*’ and record the line number to a result file.

File has 74924 lines which took 1.5 minuets to check.

I have the same task done in Python and it took 5 seconds.

Time was written at the start and end of the result file.

Can this be speed up as any file operation will be compared to Python as that is the main coding language and the bench mark for common operations.

Thanks for the help

Simon

nickb · ‎12-20-2010

Hey Simon -

It should definitely be possible to get the execution time of you CVI code to the same order of magnitude as the python code, if not faster. However, it might take a bit of work. The core python libraries have been written in C, much the same as your CVI program, but they have been optimized. The code I posted was more to demonstrate an idea than it was to be an optimized solution.

A couple things you might watch out for:

Make sure you build a release exe when you do your benchmarking
Try to only traverse each line only once
Limit memory allocation to only what's necessary. For example, in my sample, I create a new buffer each time I tokenize the string - you could instead allocate one buffer you know to be large enough and reuse it.
Limit file I/O. Instead of reading the file line by line, you might try read the file in one big chunk and storing it in memory.

If you are using LabWindows/CVI 2009 or later, you might also find some use for the Execution Profiler toolkit. It is designed specifically for this type of task - finding performance bottlenecks in code that needs to run efficiently.

Finally, because I'm curious, if you could post the python code, your code, and an example file (if you don't want to put it on the forum, you can post to ftp://ftp.ni.com/incoming , which is behind our firewalls), I'll take a look if I can find some time.

NickB

National Instruments

Srm27 · ‎12-21-2010

NickB,

Thanks for that.

3 files left in:

Search_in_File.7z

Read Me line extract and split from files.doc

SplitLineIntoArray.7z

Thanks for the help

Simon

nickb · ‎12-21-2010

Hey Simon -

I was able to get a couple minutes today to take a look at this. I'm also going to take a minute to market the Execution Profiler again...

I only looked at the example you had made that looked for lines with a * in them, because I think the lessons learned will be applicable to the rest of your project as a whole. As I had expected, the major performance culprit was the file I/O associated with reading each line. This can be clearly seen (99.9% of the 534.6 seconds) by profiling the example you posted to the ftp site:

To resolve this performance issue, I read the entire file into one large memory block. I then split the file into lines, in much the same way as I split each line into items earlier. This way, I only had to hit the disk once. Then it is simply a matter of checking each of these lines for a *. Once this was done, I profiled the code again to get the following:

As you can see, the entire operation now only takes .62 seconds, and the time distribution is relatively uniform over the functions that were called. In comparison, the example python script that you posted took 1.39 seconds to execute on my computer.

I've attached my short example, hopefully you can find it useful. As always, let me know if you have any questions

NickB

National Instruments

RobertoBozzolo · ‎12-22-2010

Wow Nick! A some-minutes process reduced to less than1 second! I will carefully study your code (that char *** parameter amazes me!) and re-use its principles in my applications

Proud to use LW/CVI from 3.1 on.

My contributions to the Developer Community
________________________________________
If I have helped you, why not giving me a kudos?

Srm27 · ‎01-12-2011

Nick,

Thanks for the information.

I have been side tracked onto another project so it may be a while before I get back.

The information got from this exercise has helped.

Thanks for all the help

Simon

LabWindows/CVI

line extract and split from files

Rif.: line extract and split from files

Rif.: line extract and split from files

Rif.: line extract and split from files

Rif.: line extract and split from files

Rif.: line extract and split from files

Rif.: line extract and split from files

Rif.: line extract and split from files