Computations – PlinyCompute

PlinyCompute provides a convenient object-oriented and UDF-centric programming interface through C++. Users can customize pre-defined Computation objects using PlinyCompute’s unique lambda calculus interface. Then users can connect customized computations to compose a query graph in directed acyclic graph form.

In PlinyCompute, following computations are supported. Interfaces of those computation objects are defined here.

Computation Name	Explanation	Example
`SelectionComp`	A selection computation for which user can use lambda calculus to customize the selection condition for filtering, and the projection condition for transformation.	LDADocTopicProbSelection.h
`MultiSelectionComp`	A flatten selection computation for which the projection condition is actually a flatmap that returns a Vector of objects to be flattened.	CustomerMultiSelection.h
`AggregateComp`	An aggregation computation for which user can use lambda calculus to customize the key projection function, and value projection function. So that each input object will be transformed into a key-value pair to be aggregated on key.	CustomerSupplierPartGroupBy.h
`JoinComp`	A join computation for which user can use lambda calculus to customize the join selection condition, and projection condition. The number of join inputs are not limited, and multi-way join is supported in a declarative way.	LDADocWordTopicJoin.h
`TopKComp`	A topK computation for which user can use lambda calculus to customize the value (score for ranking) projection function, so each input object will be transformed into a score for topK comparison.	TopJaccard.h
`PartitionComp`	A partition computation for which user can use lambda calculus to customize the partition key projection function, so each input object will be dispatched to the hash partition based on its partition key.	LineItemPartitionComp.h
`ScanUserSet`	A data reading computation for which user can customize to specify the source set as input for a source computation.	`Handle myTPCHCustomerScanner = makeObject<ScanUserSet>("tpch", "customer");`
`WriteUserSet`	A data writing computation for which user can customize to specify the sink set as output from a sink computation.	`Handle myQ01Writer = makeObject <WriteUserSet> ("tpch", "q01_output_set");`

Example.

A Customized Computation object is also a PC object, so you need follow all the rules for creating PC object.

These are the steps to create a computation that aggregates a pdb:Vector of doubles:

Step 1: Define a class with the computation.

This class has to satisfy these requirements:

Derive from pdb::Object. In this example, this is indirectly satisfied because the DoubleVectorAggregation class has the following hierarchy AggregateComp->AggregateCompBase->AbstractAggregateComp->Computation->pdb::Object
Include the ENABLE_DEEP_COPY macro in the public section of the class.

#include "AggregateComp.h"
#include "DoubleVector.h"
#include "DoubleVectorResult.h"
#include "LambdaCreationFunctions.h"


using namespace pdb;

class DoubleVectorAggregation
    : public AggregateComp&lt;DoubleVectorResult, DoubleVector, int, DoubleVector&gt; {

public:
    ENABLE_DEEP_COPY

    DoubleVectorAggregation() {}

    // the below constructor is NOT REQUIRED
    // user can also set output later by invoking the setOutput (std :: string dbName, std :: string
    // setName)  method
    DoubleVectorAggregation(std::string dbName, std::string setName) {
        this-&gt;setOutput(dbName, setName);
    }


    // the key type must have == and size_t hash () defined
    Lambda&lt;int&gt; getKeyProjection(Handle&lt;DoubleVector&gt; aggMe) override {
        return makeLambda(aggMe, [](Handle&lt;DoubleVector&gt;&amp; aggMe) { return 0; });
    }

    // the value type must have + defined
    Lambda&lt;DoubleVector&gt; getValueProjection(Handle&lt;DoubleVector&gt; aggMe) override {
        return makeLambda(aggMe, [](Handle&lt;DoubleVector&gt;&amp; aggMe) { return *aggMe; });
    }
};

Give a name to this file (e.g. DoubleVectorAggregation.h) and save it in the sharedLibraries/headers/ folder.

Step 2: Create the following file DoubleVectorAggregation.cc, which has to include the following lines:

#include “DoubleVectorAggregation.h”, which contains the code from the file created in the previous step.
#include “GetVTable.h”, which contains code for properly handling classes defined in shared libraries.
The GET_V_TABLE macro.

#include "DoubleVectorAggregation.h"
#include "GetVTable.h"

GET_V_TABLE(DoubleVectorAggregation)

save the file in the sharedLibraries/source/ folder.

Step 3: Compile and build.

Run the following make command

$ make DoubleVectorAggregation

This will create the shared library named libDoubleVectorAggregation.so in the libraries/ folder.
At this point, a user-defined computation has been successfully created as a shared library. You can repeat the previous steps to create additional user-defined computations.

Step 4: Register the shared library in PlinyCompute’s catalog.

The final step to make this user-defined computation available to PlinyCompute, it has to be registered in an instance of PlinyCompute’s catalog. This can be done by including the following statement in your client code, which assumes that the shared library created in the previous step is located in the libraries folder.

   pdbClient.registerType("libraries/libDoubleVectorAggregation.so");