Solve Linear Equations - Block Tridiagonal Matrix

diluculo · ‎11-26-2013

Hi,

I developed a vi that can be used to solve block tridiagonal systems of equations by Thomas Algorithm. Such systems are usually found in finite difference method used to solve partial differential equations.

For 1,000 times iterations, its elapsed time is ~500ms. Now, I want to finish the job within 100ms. Is there any way to improve the performance?

Thanks, Jong Hyun

* For details of the Thomas algorithm, please refer to wikipedia webpage, http://en.wikipedia.org/wiki/Thomas_algorithm. Please note that it describes how to solve tridiagonal linear equations, but same logic can be applied to block tridiagonal equations.

* If you use the Sparse SLE vi of Multicore Analysis and Sparse Matrix Toolkit, the elapsed time is ~450ms.

altenbach · ‎11-26-2013

I haven't studies your implementation at all, but as a first step you might want to disable debugging and inline the subVIs.

LabVIEW Champion.

diluculo · ‎11-26-2013

Thanks, Altenbach,

I already did your suggestions: Both(disable debugging and inline options) did not have an effect at all on performance in this case.

The Thomas algorithm for block tridiagonal matrix, simple to say, (1) divides big matrix into small blocks or submatrix and (2) calculates a series of inverse submatrix and multiplication. This is much faster than handling one big matrix.

Now, I think improvement can be done

(a) based on alternative algorithm -> I don't ask this.

(b) by modification of block diagram

(c) or by using faster matrix operations(Inverse matrix and multiplication, or solve linear equations)

In (c), my question is why LabVIEW's intrinsic matrix operations("Real Invert Matrix.vi" and "A x Vector.vi") are too slow. Here, "too slow" is just my feeling. Actually, I don't know whether I can get a solution in an instance with Matlab or any numerical libraries(BLAS/LAPACK?). If then, I want to know if anyone tried to handle external numerical libraries in LabVIEW and how to do.

Jong Hyun

GregSands · ‎11-27-2013

I think the MASM toolkit does use an external math library (Intel?) - in any case, it's probably close to the fastest general-purpose library you could use.

I tried a few things which gave a little improvement:

Can you use SGL with the MASM toolkit? That seems to give about a 20% improvement, at the cost of a little precision.
Converting to a Sparse matrix, and using the PARDISO Solver gave about a 2-fold speedup, even without any tweaking of the Solver Parameters.
If you have the same sparsity pattern, or the same matrix with different RHSs, you can use the PARDISO Advanced Solver, which seems to give >10x speedup. Have a look at this help page: http://zone.ni.com/reference/en-XX/help/373601B-01/lvmasmt/solving_sparse_linear_eq_in_lv/

So it may be possible to get the speed you want, but it depends on the parameters of your inputs.

diluculo · ‎11-27-2013

Thank you, GregS

Just now, I tested the set of PARDISO Advansed VIs you suggested. It took a just 100ms!

However, problem is MASM toolkit is not suitable for my purpose. It's expensive and requires another run-time engine as far as I know. Anyway, if MASM is the last way, I will consider buying the toolkit.

In my application, the sparsity pattern(Block Tridiagonal) of matrix A is fixed and I have to solve the equations repeatedly with sequence of vector b. For each calculation, only two submatrix in the first row of matrix A are changed. In such a case, I want to know if there is a special mathmatical treatment.

I think, if I can record the each inverse submatrix in the first iteration and use them repeatedly, then speed will be better.

DSPGuy · ‎12-02-2013

LabVIEW base/full/pro all use Intel's MKL library for linear algebra on MS Windows and Linux. Intel has designed MKL to be very fast for large matrices, and breaks a given problem apart to take advantage of the different levels of cache. The linear algebra calls go through several layers of functions, including size checks where appropriate, and may allocate workspace memory each call, before finally computing the result. This fixed overhead is readily apparent when the matrix is so small (3x3).

I coded versions of the matrix-matrix multiply, matrix-vector multiply, and matrix inverse for the 3x3 case explicitly in G using the in-place element structure and full indexing (no loops, all scalar operations). This eliminates the overhead of checks, and allows better in-placeness. Doing this brings your benchmark down to ~70ms on my machine. Unfortunately this requires non-trivial code changes if you change the block size, but it does deliver good performance.

"I think, if I can record the each inverse submatrix in the first iteration and use them repeatedly, then speed will be better."

I agree. You should be able to pre-compute most of the forward sweep portion of the algorithm, and just replace the changed sub-matrices and continue the forward solve. This may mean re-arranging the variables to have the updated/changed sub-matrices at the lower right of the input matrix.

-Jim

diluculo · ‎12-02-2013

Dear DSPGuy,

Many thanks for your kind efforts and reply.

In my application, the size of submatrix is not fixed, however, I have to learn your approach. A month ago, I made my own matrix-matrix multiply.vi and matrix-vector multiply.vi but I couldn't see significance speed up. However, I did not use "the in-place element structure and full indexing".

Would you please attach "matrix inverse 3X3.vi, matrix matrix multiply 3X3.vi, matrix vector multiply 3X3.vi", which were not included in the Rev002.zip?

- Jong Hyun

DSPGuy · ‎12-03-2013

Apologies for the omission. Here they are.

-Jim

diluculo · ‎12-03-2013

Thanks Jim,

I can understand the in-place element structure and full indexing are important to performance improvement. The subVIs you built reduce the calculation time rapidly.

- Jong Hyun

DSPGuy · ‎12-05-2013

I made an error in translating your code. Doesn't affect your numerical results much, but I mistakenly swapped the inputs of each matrix-matrix multiply in step 2. This is fixed in the attached code. I was also curious how much faster the code might run if all the linear algebra was combined in step 2, and in step 1. This saves some additional time, bringing the benchmark to 50ms. Of course this is even more work to update if the blocksize changes, but it is an interesting benchmark.

-Jim

LabVIEW

Solve Linear Equations - Block Tridiagonal Matrix

Solve Linear Equations - Block Tridiagonal Matrix

Re: Solve Linear Equations - Block Tridiagonal Matrix

Re: Solve Linear Equations - Block Tridiagonal Matrix

Re: Solve Linear Equations - Block Tridiagonal Matrix

Re: Solve Linear Equations - Block Tridiagonal Matrix

Re: Solve Linear Equations - Block Tridiagonal Matrix

Re: Solve Linear Equations - Block Tridiagonal Matrix

Re: Solve Linear Equations - Block Tridiagonal Matrix

Re: Solve Linear Equations - Block Tridiagonal Matrix

Re: Solve Linear Equations - Block Tridiagonal Matrix