11-13-2008 02:17 AM
I have a project in which I am accessing a large file system, with files up to a few GBs in size. For testing purposes it must be possible to read the data from some of these files into a windows file, but currently I am only getting speeds of approximately 1MBps when performing this action. Are there quicker ways to write binary files? Currently I am looping through reading the file out and writing the file in chunks of 1MB, but there is very little processing within this loop other than the read and write. I suspect the issue may be to do with caching, but cannot be sure.
Any ideas would be appreciated.
11-13-2008 03:57 AM
Hello Frau,
I would like to try and provide you with some suggestions. Please post your VI on this forum so that I can have a look at what exactly you are doing. From your post it also sounds like you are working in a Windows OS environment. Could you please confirm which OS you are using, as well as the file type you are trying to read from? Once I have this information I can try and provide some steps to narrow down the write performance to software and programming, or to a limitation of your RAM and Hard disk. How did you measure the write speed of 1MByte/s?
While I review the VI, please investigate your PC manufacturer's specification.
Regards,
11-13-2008 04:11 AM
I would prefer to email my VI than post them online if that is possible.
I am using Windows XP, and am reading from a completely unformatted file on a RAID. This is only a temporary test scenario while I wait for a removable memory module and suitable card to make it possible to access this memory module. The data is in a series of blocks but is not in a recognised format so is treated as unformatted data.
I am using a DLL function to read the data out, as the data can only be accessed in a certain way.
The write speed is not always 1MByte/s, once one write function has been performed if it is repeated directly after it is about 10 times as fast.
The computer I am using is a dell, using an Intel Xeon 2.4GHz Quad-core processor with 2GB of RAM.
11-13-2008 04:42 AM
Hello Frau,
I understand if you would like to protect the IP in your entire VI. One of the benefits of the Forum is that many NI users can input their ideas and have us looking in different directions for solutions. Is it possible to isolate parts of your VI, such as your read write function, and post it here, preferably as an operational, separate VI, or as a screenshot?
Could you also please confirm your version of LabVIEW, and where in the world you are located? This is so that I can ensure that the most appropriate support branch can take your email, if you would not prefer to post your VI afterall.
Regards,
11-13-2008 05:09 AM
Hope the screen shot helps, the dll function I am using simply reads the requested number of blocks of data starting at the requested block offset into a byte array.
I am in England and currently using LabVIEW 8.5.
Thanks,
Will
11-13-2008 11:21 AM
Hello Will,
The issue you have described relates to a Read/Write access time feature of your hard disk. This could be independent of your actual programming, which you have attached above. Although you have relatively high performance hardware in your PC, such as a RAID array in particular, you can achieve greater throughput by changing the size of the block of data that you are reading, and then trying to write to your file. Your OS, BIOS, and Harddisk are communicating data according to the size of the clusters on disk. You will see increased throughput if you read from you large data file multiples of the size of the cluster. In your post you mentioned that you see lower rates initially, and then increases in throughput. These instances are likely when your read/write function has reached a block multiple of the hard disk cluster size.
Another forum post has dealt with a similar issue, and may provide you with further ideas.
http://forums.ni.com/ni/board/message?board.id=170&message.id=33375&requireLogin=False
In NI Support we use this similar method for improving performance on Real-Time access times. For confirmation, a Knowledgebase Article also exists in the Developer Zone which you can adapt to your application.
http://digital.ni.com/public.nsf/allkb/CBE7CCDF8FAC904F86256AF80059D176?OpenDocument
Please post back with your experience once you have taken this information into consideration, or if you require further help to implement this.
Regards,
11-13-2008 11:44 AM
frau wrote:I ...
The write speed is not always 1MByte/s, once one write function has been performed if it is repeated directly after it is about 10 times as fast.
...
Assuming that when you repeat the operation, you are writting to the same file...
That speed increase is most likely due to the file space having previously been allocated for the file. For high-speed logging I will generally "pre-write" the file with more data than I anticipate requiring durring the collection.
This speeds thing up since the OS does not have to;
stop find space,
mark as used,
update the directory info
The overhead required to maintain the file system integrety demands the disk drive heads move from one track to another. So instead of just being able to write one sector after another the heads have to move around. When the heads are moving you can no longer look at the stream to disk spec of the drive sub-system but must concern yourself with the seek time spec. Orders of maginutde difference.
Ben
11-13-2008 12:57 PM - last edited on 11-13-2008 01:38 PM by Support
frau wrote:
Notice that you have three value property nodes per loop interation. Since each of these execute synchronously and force a thread switch to the UI thread, they will place a huge extra load on the sytem and will impact fast loops. You even have two instances of "blocks read per loop" for no reason at all. What is the probability that the'll return different values? (and if they would, you'll get wrong behavior!) One instance is enough! Don't be afraid to branch a wire!
Most likely, these values will NOT change during the execution of the loop (else everything would probably blow up!), you need to place these outside the loop. See if it makes any difference.
The global variable belongs outside the loop.
The array initialization must go outside and before the loop. All this needs to be done only once and not with every iteration of the loop. 🙂
11-14-2008 02:16 AM
Thanks for the many suggestions, they all make sense and seem very logical.
I first tried altenbach's ideas to tryr and increase the speed of the loop, but these made no noticable difference so I am confident that it is not the speed of the loop that is affecting the write speed but the speed of the read and write functions.
I also tried increasing the size of the cluster of data to write per loop - from 1MB to 10MB but again saw no noticable difference in the speed, it still took nearly 2 minutes to write 100MB.
I don't think pre-writing a file will speed the write function either, as the only time the function seems to execute quicker is when it is reading the same data as it previously did, not when the target file is the same size or larger than the amount of data to be written. It would also be impractical to pre-write files when this project is actually released, as they may be writing a few GBs of data.
Any other ideas of why it is so slow?
11-14-2008 03:52 AM - edited 11-14-2008 03:57 AM
Pre-writing only means, opening the file, setting the file size to 5 GB and the file is pre-written.
Just try it. It might help to store the data on a dedicated disk on a seperate IDE/SATA controller.
Also try to write full sectors at once, so instead of 1.000.000 bytes, 1.024*1.024=1.048.576 bytes.
With these techniques we made a file copy tool that was just as fast as the Windows file copy function.
But we had benefits, we could read the progress of the copy, abandon on shut down and continue later.
Another thought, the data is passed as a pointer to the DLL cal, right?
You could claim the Dataspace outside the for-loop and reuse the buffer every time since you aren't using the data anywhere else.
So the space is only allocated once and not every run.
Ton