Reverse Automatic Differentiation Implementation
From ASCEND
This article documents the implementation details of Reverse Automatic Differentiation routines in ASCEND by Mahesh Narayanamurthi.
Automatic Differentiation (AD) is a numerical means of obtaining derivative of a function with respect to each of its independent variable at a given point. It differs from other types of differentiation:
- Symbolic differentiation - requires symbol manipulation and is slow and memory-intensive
- Numerical (finite difference) differentiation - fast, but inherently involves round-off errors
Automatic Differentiation or sometimes also called Algorithmic Differentiation requires neither symbol manipulation nor does it run into round-off errors. Its a means of obtaining exact derivatives of the function at a given operating point. Also, there are two types of AD:
- Forward AD
- Reverse AD
You can read more about AD on wikipedia.
Also, the book by Andreas Griewank Evaluating Derivatives gives a deep insight into AD and its implementation as a computer program. The same model proposed for Reverse AD in Chapter 5 is implemented here, with suitable modifications to work within existing structures in ASCEND. So, please refer to that book, alongside the following.
Contents |
Structures
Doublet_struct
The Doublet Structure is used to keep the value of an object (variable or its bar) and dotted value of the object.
typedef struct Doublet_struct{ double val; double dot; } Doublet;
Elements_struct
The Elements_struct structure is used to store temporary trace information which is later used during ReturnSweep to calculate adjoints. It is also a single node in the computational graph that is formed as a result of parsing expressions.
typedef struct Element_struct{ Doublet val; /**< Stores node value and tangent of node */ Doublet bar; /**< Stores adjoint and tangent of adjoints */ enum Expr_enum expr_type; /**< Information about the type of operation is to be recorded */ const struct Func *fxnptr; /**< Function Lookup */ unsigned long sindex; /**< Contains information to identify the variable */ struct Element_struct *arg1; /**< Points to the first argument of the operation */ struct Element_struct *arg2; /**< Points to the second argument of the operation */ struct Element_struct *next; /**< Variable length trace information */ struct Element_struct *prev; /**< Used for rewinding the tape */ } Element;
Redouble_struct
The redouble structure is used to maintain a pointer of a term (element) of the relation which is pushed and poped from the stack for function evaluation.
typedef struct Redouble_struct{ Element *ref; } Redouble;
TapesList_struct
The tapes list contains an array of MAX_TAPE_COUNT number of pointers to trace_elements. Active tape can be set by calling SetActiveTape, similarly GetActiveTape works to get the current tape on which recording is done.
typedef struct TapeList_struct{ Element *tape[MAX_TAPE_COUNT]; unsigned active; } TapeList;
Bridging With ASCEND Structures & CODE
Ideally, we would like to use the tree that ASCEND builds for storing the temporary information as this implementation may be duplicating the entire tree for each relation. During function evaluation, the Trace is gradually built on the Tape that is active in the tapes_list object. Following this, the ReturnSweep and AccumulateAdjoints use this trace information in much the same way as described in the book by ANDREAS GRIEWANK in Chapter 5. Also, a local Func* is maintained to do look-ups during ReturnSweep for derivative calculation (although,we believe this can be averted by pre-calculating the derivatives). The sindex field in the trace_elements is used during AccumulateAdjoints and the expr_type field is recorded to make branch predictions during ReturnSweep.
See also User:Mnm87

