Using large datasets in python node 2

ytan · ‎12-09-2023

Just recently I posted the following question: https://forums.ni.com/t5/LabVIEW/Using-large-datasets-in-python-node/m-p/4343916#M1273928. I modified my LabVIEW program to save 2D data to generate a file, and modified python program to load this file, and save a new file after UMAP processing. But actually the same problem occurs. When running the UMAP code in python node in LabVIEW, the error 1672 occurs. This occurrence has again a data size dependence: the error does not occur when the data size is 100000 x 100, but the error occurs when the data size is 200000 x 100. Is there a data size limit that can be handle in python node? Or is there a timeout function in python node? I appreciate if you could give me any advise. I attached my LabVIEW code in this post, and the following is a python code to communicate LabVIEW and python (named as functions4.py).

import umap

import numpy as np

def lv_umap(param0,param1,param2,input_tmp_file,output_tmp_file):

dat1 = np.loadtxt(input_tmp_file)

dat2 = umap.UMAP(n_neighbors=param0,n_components=param1,min_dist=param2).fit_transform(dat1)

np.savetxt(output_tmp_file, dat2, delimiter='\t')

num_rows, num_columns = dat2.shape

return num_rows, num_columns

alexderjuengere · ‎12-09-2023

@ytan wrote:

... When running the UMAP code in python node in LabVIEW, the error 1672 occurs. This occurrence has again a data size dependence: the error does not occur when the data size is 100000 x 100, but the error occurs when the data size is 200000 x 100. Is there a data size limit that can be handle in python node?

I have seen this before:

# case1, data exchange directly via python node

https://forums.ni.com/t5/LabVIEW/Tensorflow-Python-Node-breaks-after-weights-load/m-p/4216894#M1222...

- 1672 could be caused by an (unsupported, not-fully supported) numpy function or definition

- works to a size of a 2d array of double (=8 byte, 64-bit IEEE) with 1000x2048 elements and breaks at 10.000x2048 (I didn't bother to pin it down further...)

- size = 1000 * 2048 * 8 = 16,384 MByte (working)

- size = 10.000 * 2048 * 8 = 163,84 MByte (broken, error 1672)

# case 2, data exchange via file

- (working) using pandas: https://forums.ni.com/t5/LabVIEW/Pandas/m-p/4167105#M1203513

- (broken, error 1672) using numpy.loadtxt https://forums.ni.com/t5/LabVIEW/Pandas/m-p/4167105#M1203513

- I made no attempts to look for a limit

ytan · ‎12-09-2023

Thanks a lot. I understand that numpy and other packages are not fully supported. I will try not to use these packages.

alexderjuengere · ‎12-11-2023

@ytan wrote:

Thanks a lot. I understand that numpy and other packages are not fully supported. I will try not to use these packages.

here's the official NI quote regarding numpy https://forums.ni.com/t5/LabVIEW/Tensorflow-Python-Node-breaks-after-weights-load/m-p/4217127#M12228...

especially numpy's loadtxt and (obviously) savetxt are affected - in your original post, you are you using double (float64, 8 bytes per number)?

(working) 100.000 * 100 * 8 =80 MByte

(broken) 100.000 * 200 * 8 =160 MByte

can you use pandas instead of numpy.savetxt? or do you also run into a file limit of 160 Mbytes?

ytan · ‎12-11-2023

Thanks for your suggestion. I tried to use pandas, but I received the same error. Again, there is no error when using 100000 x 100 data (data format is float32), but it causes an error when using 200000 x 100 data. numpy is used in the UMAP program, so I guess it can be a problem, but it seems too hard to change all the numpy functions in the UMAP program to others. The following is the code I made:

import umap

import pandas as pd

def lv_umap(param0,param1,param2,input_tmp_file,output_tmp_file):

df = pd.read_csv(input_tmp_file, delimiter='\t', header=None)

dat1 = df.values

dat2 = umap.UMAP(n_neighbors=param0,n_components=param1,min_dist=param2).fit_transform(dat1)

pd.DataFrame(dat2).to_csv(output_tmp_file, sep='\t', index=False, header=False)

num_rows, num_columns = dat2.shape

return num_rows, num_columns

alexderjuengere · ‎12-12-2023

@ytan wrote:

Thanks for your suggestion. I tried to use pandas, but I received the same error. Again, there is no error when using 100000 x 100 data (data format is float32), but it causes an error when using 200000 x 100 data.

(working) 100.000 * 100 * 4 = 40 MByte

(broken) 200.000 * 100 * 4 = 80 MByte

I don't know exactly, how the python node manages to share data between labview and python, but it is said there is some translation between data spaces going on (marshalling).

you still can run your python code from commandline, in your python script write a file to disk and read that file to labview.

ytan · ‎12-12-2023

Thanks again for your comments. I also think it will be a solution to run the python code from command line. I found system exec.vi that enables to use command line in LabVIEW, so I will try to use this.

LabVIEW

Using large datasets in python node 2

Using large datasets in python node 2

Re: Using large datasets in python node 2

Re: Using large datasets in python node 2

Re: Using large datasets in python node 2

Re: Using large datasets in python node 2

Re: Using large datasets in python node 2

Re: Using large datasets in python node 2