PlinyCompute (PC for short) is a system for developing high-performance, data-intensive, distributed computing codes, especially tools and libraries. PC is designed to fill the gap between HPC software such as OpenMP and MPI, which provide little direct support for managing very large data sets, and dataflow platforms such as Spark and Flink, which may give up significant performance through their reliance on a managed runtime to handle memory management (including layout and de/allocation) and key computational considerations such as virtual method/function dispatch to the JVM.
Core design principle: Declarative in the large, high-performance in the small. PC is unique in that in the large, it presents the programmer with a very high-level, declarative interface, relying on automatic, relational-database style optimization to figure out how to stage distributed computations. PC’s declarative interface is higher-level than other systems, in that decisions such as choice of join ordering and which join algorithms to run are totally under control of the system. This is particularly important for tool and library development because the same tool should run well regardless of the data it is applied to—the classical ideal of data independence in database system design.
In contrast, in the small, PlinyCompute presents a capable programmer with a persistent object data model and API (the “PC object model”) and associated memory management system designed from the ground-up for high performance. All data processed by PC are managed by the PC object model, which is exclusively responsible for PC data layout and within-page memory management. The PC object model is tightly coupled with PC’s execution engine, and has been specifically designed for efficient distributed computing. All dynamic PC Object allocation is in-place, directly on a page, obviating the need for PC Object serialization and de-serialization before data are transferred to/from storage or over a network. Further, PC gives a programmer fine-grained control of the system memory management and PC Object re-use policies.