1. What are the recommendations for setting number of threads, and shared memory pool size when starting a cluster or workers in below commands?
$PDB_HOME/scripts/startCluster.sh distributed masterIp keyFile numThreads sharedMemSize
$PDB_HOME/scripts/startWorkers masterIp keyFile numThreads sharedMemSize
numThreads: Usually this value should be set to the number of CPU cores on a worker. We also find through our benchmarks that setting it to 1.5 times of number of CPU cores can often get the best results.
sharedMemSize: Usually this value should be set to half of the available memory on a worker. You can also try to increase the value to fit your dataset size.
For workloads that have join and aggregation computations, you need leave some memory for hashmaps, which are allocated on heap instead of in memory.
2. Can I have more control of PlinyCompute system?
Yes. PlinyCompute is declarative in the large, native and tunable in the small. Particularly, it provides advanced users such as tool developers knobs to better control the memory management, resource utilization and so on. Here we describe those parameters in detail.
You can find more details for system configuration here.
3. My aggregation outputs very small data, how can I make sure that data are all sent to one node for aggregation?
AggregateComp provides following APIs to make sure all data are sent to one node for aggregation:
yourComputation->setCollectAsMap(true);
yourComputation->setNumNodesToCollect(1);
4. Why I saw this error: “ERROR in fixing VTableMap for objectTypeID=XXXX” ?
Some PC object is not recognizable by the system, try to check whether all your PC object types are registered with shared libraries.
5. Why I saw this error: “connection error” ?
The worker died, try to restart the cluster. If it still can’t work after restarting the cluster, or cluster can’t be restarted, it is possible some data is corrupted, so in this case you need run “cleanup.sh” script to format the storage.
6. Why I saw a lot of this message: “Storage server: >>>>>>>>>> finished cache eviction!”
You data size exceeds the shared memory pool size. Try to increase the shared memory pool size.