PlinyCompute is declarative in the large, native and tunable in the small. Particularly, it provides advanced users such as tool developers knobs to better control the memory management, resource utilization and so on. Here we describe those parameters in detail.

 

numThreads: Usually this value should be set to the number of CPU cores on a worker. We also find through our benchmarks that setting it to 1.5 times of number of CPU cores can often get the best results.

 

sharedMemSize: Usually this value should be set to half of the available memory on a worker. You can also try to increase the value to fit your dataset size.
For workloads that have join and aggregation computations, you need leave some memory for hashmaps, which are allocated on heap instead of in memory.

You can configure above parameters when using below scripts:

$PDB_HOME/scripts/startCluster.sh distributed masterIp keyFile numThreads sharedMemSize
$PDB_HOME/scripts/startWorkers masterIp keyFile numThreads sharedMemSize

 

number of partitions:  by default, it assumes the total number of partitions for join and aggregation is 1.5 times of number of CPU cores. You can change the ratio by setting numPartitionRatio parameter when starting the pdb-manager:

$PDB_HOME/bin/pdb-manager hostname port pseudoModeOrNot pem_file numPartitionRatio

You need also modify this parameter in the scripts, e.g. startCluster.sh, startWorkers.sh

 

There are also many parameters which you can configure through Compiler flags:

Environment Variable Meaning
DEFAULT_BATCH_SIZE The batch size of vectorized pipeline execution. (default: 100)
EVICT_STOP_THRESHOLD  The ratio of used shared memory to trigger or stop cache eviction. (default: 0.95)
JOIN_HASH_TABLE_SIZE_RATIO  The space inflation ratio for building Join hash table for broadcasting: the size of the Join hash table for broadcasting to the size of the data that is used to build the Join hash table. (default: 1.5)
AUTO_TUNING

When the system is running on Cluster Mode, it is recommended to add this flag, which will automatically detect worker nodes memory and hardware thread number, then compute required heap size for building hash maps.

When the system is running on Pseudo-Cluster Mode, it is recommended to turn off this flag, and manually configure following parameters: DEFAULT_MEM_SIZE, DEFAULT_NUM_CORES, DEFAULT_HASH_PAGE_SIZE.

(default: on)

DEFAULT_HASH_PAGE_SIZE  Memory size for building hashmaps on heap, it will only be used if AUTO_TUNING is not set. (default: 536870912)
DEFAULT_MEM_SIZE  Available memory size on worker, it will only be used if AUTO_TUNING is not set. (default: 73014444032)
DEFAULT_NUM_CORES Available number of hardware threads on worker, it will only be used if AUTO_TUNING is not set. (default: 8)
DEFAULT_PAGE_SIZE  Default page size for creating a set. (default: 268435456)
DEFAULT_MAX_PAGE_SIZE  Max page size allowed for creating a set, shuffling, broadcasting. (default: 268435456)
DEFAULT_SHUFFLE_PAGE_SIZE  Default page size for shuffling. (default: 268435456)
DEFAULT_BROADCAST_PAGE_SIZE  Default page size for broadcasting. (default: 268435456)
DEFAULT_MAX_CONNECTIONS  Max number of concurrent workers. (default: 200)