Parallelizing ASCEND

From ASCEND

Jump to: navigation, search

A breakdown of paths to improved parallel computing performance is:

1) openmp parallelism for computing functions and gradients using the binary code generator option. Most bang for most users.

2) partial mpi parallelism: connecting an mpi-parallel nonlinear solver interface (probably petsc or octave) to allow mpi-based solves of problems for which the model fits on a single compute node. Not many people own clusters, but those that do usually are up to the most interesting things.

3) full parallelism of the ascend compiler (well beyond the scope of a 2-3 month summer project-- this would only be possible in follow-on work related to a thesis). Probably this should only occur after a New Compiler appears. Some review of the research literature is in order first.

It seems to me that 1 and 2 could each be done in about a full-time month with someone very skilled with C and compilers and using numerical libraries.

Background

The original wiki idea posted is to use OpenMP (now available in gcc, intel, and most other compilers) most recent versions to parallelize the evaluation of gradients and residuals in the ascend solvers.

The highest goal, of course, is to make ascend capable of handling models so large that they don't fit in memory on a local machine and thus require distributed memory (MPI or other) computing. Why? Because the technical problems of renewable energy usually end up requiring solutions of large equation systems if solved in high fidelity modeling approaches.

It's clear that MPI alone is not the answer, and it comes with many software problems in ascend that must be solved simultaneously. Even when we go to parallel cluster machines, each node is multicore and MPI is probably not the best way to use those cores; hence the interest in openmp.

OpenMP lets us parallelize (with minimal other code change I hope) the function/gradient evaluations which currently dominate our time to solve. The evaluation engine is single threaded right now. Once we can more efficiently use a single multicore machine, then it becomes sensible to think about the partial distributed parallelism with MPI and then full distributed parallelism with mpi. If the openmp work goes faster than expected, the rest of the project can move on into other the higher levels of parallelism.

Partial parallelism is basically adding parallel support to the ascend nonlinear solvers (or more likely finding a parallel nonlinear solver from existing open source and hooking it up to ascend). In this case, the model compiled will fit in the memory of a single node, but we want access to more cpus (on the other nodes) to speed up evaluation and solvers both.

Full parallelism to support models too big to compile on a single compute node requires adding parallel support to the ascend compiler, a very complex task and not likely to get done in a single summer.

A totally distinct alternative to all of the above is to look at the streaming parallelism in nvidia graphics cards (cuda). If you already have experience (and hardware) for it, we might be able to define a good project.