The architecture of PlinyCompute consists of various software components explained in the following paragraphs. This figure illustrates an installation of PlinyCompute deployed in a cluster of four machines (nodes): one manager node and three worker nodes.
A PC cluster consists of a manager node as well as one or more worker nodes.
The manager node runs four software components:
- The catalog manager, serving system meta-data (including the master copy of the mapping between type codes and PC Object types) and compiled, shared libraries for performing computations over PC Objects
- The distributed storage manager, the centralized server that manages PC’s storage subsystem
- The TCAP optimizer, which is responsible for optimizing programs that have been compiled into PC’s domain-specific TCAP language.
- The distributed query scheduler that is responsible for accepting optimized TCAP computations and dynamically schedules them to run on workers.
A worker front-end process runs the following components:
- The local catalog manager: handles requests and buffers data served from the master catalog manager. It is also responsible for fetching and dynamically loading code as needed when a virtual method call is made over a PC Object .
- The local storage server: manages a shared-memory buffer pool used for buffering and caching datasets. It also manages a user-level file system that is used to persist a portion of one or more datasets stored by PC (the partitioning of datasets to storage servers is managed by the distributed storage manager). The storage server also manages temporary data that must be spilled to secondary storage. The shared memory buffer pool is created via a mmap system call so that data stored in it can be read by the backend process (forked from the front-end process) via zero-copy access. A significant performance advantage of having a buffer pool is that data can be cached in memory across different applications, rather than having to be loaded from external storage (which often requires data movement) each time.
- The message proxy: responsible for communicating with the worker backend process, and acts as a bridge between the worker backend process and the manager node, the local catalog, and the local storage server.
A worker back-end process serve as the query processing engine and execute computations that are forwarded from the front-end process.