A linear algebra implementation called lilLinAlg, based on PlinyCompute (PC) was developed. The complete listing of this application can be found on the github repository TestLA. In lilLinAlg, a distributed matrix is stored as a set of PC Objects, where each object in the set is a MatrixBlock. lilLinAlg uses the MatrixBlock object to implement a set of common distributed matrix computations, including transpose, inverse, add, subtract, multiply, transposeMultiply, scaleMultiply, minElement, maxElement, rowSum, column Sum, duplicateRow, duplicateCol, and many more. However, lilLinAlg programmers do not call these operations directly, rather, lilLinAlg implements its own Matlab-like DSL.
Given a computation in the DSL, lilLinAlg first parses the computation into an abstract syntax tree (AST), and then uses the AST to build up a graph of PC Computation objects which is used to implement the distributed computation. For example, at a multiply node in the compiled AST, lilLinAlg will execute a PC code similar to the following:
Handle <Computation> query1 = makeObject<LAMultiplyJoin>(); query1->setInput (0, leftChild->evaluate(instance)); query1->setInput (1, rightChild->evaluate(instance)); Handle <Computation> query2 = makeObject<LAMultiplyAggregate>(); query2->setInput(query1);
Here, LAMultiplyJoin
and LAMultiplyAggregate
are both user-defined Computation classes that are derived from PC’s JoinComp
class and AggregateComp
class, respectively; these classes are chosen because distributed matrix multiplication is basically a join followed by an aggregation. Internally, the LAMultiply
Join and LAMultiplyAggregate
invoke the Eigen numerical processing library to manipulate MatrixBlock objects.
In lilLinAlg, a distributed matrix is stored as a set of PC Objects, where each object in the set is a MatrixBlock, storing a contiguous rectangular sub-block of the matrix, in the following example:
class MatrixBlock : public Object { private: MatrixMeta meta; MatrixData data; };
where MatrixMeta
and MatrixData
are defined as:
class MatrixMeta : public Object { private: int blockRowIndex; // row index of this block int blockColIndex; // col index of this block int totalRows; // total number of rows in matrix int totalCols; // total number of cols in matrix };
class MatrixData : public Object { private: Handle<Vector <double>> rawData; int rowNums; // number of rows in this block int colNums; // number of cols in this block };
MatrixMeta
stores the location of the block in the overall matrix, and MatrixData
stores the actual contents of the matrix. The actual data stored in a MatrixData
object should be small enough to fit completely in a PC page (by default, PC’s page size is 256MB). A typical MatrixData
object stores a 1,000 by 1,000 sub-matrix that is eight megabytes in size.