Question: What exactly is a pdb :: Object?

Answer: pdb :: Object is an abstract base class. Instances of classes descended from pdb :: Object are allocated “in place” (that is, pre-serialized). Further, data structures constructed using pdb :: Objects are portable, in the sense that they can be moved from place to place (even across a network) and yet “pointers” inside of them still function correctly—even after they have been moved—and calls to virtual functions still work correctly, after objects have been moved across processes. All data stored in PDB is stored as pdb :: Objects.

Question: What are the rules for using pdb :: Objects?

Answer: Here are the 11 commandments for using pdb :: Objects, from an application programmers point of view.

  1. All non-abstract classes descended from pdb :: Object shall have the ENABLE_DEEP_COPY macro in their public method/member section. This allows them to be correctly “deep copied” and allocated/deleted by the PDB software base, when this is necessary.
  2. Classes descended from pdb :: Object shall not have C/C++ pointers (or smart pointers) inside of them. Instead, use the pdb :: Handle <> template. This provides the same functionality as a pointer (including virtual method dispatch), plus it provides automatic reference counting (like a smart pointer) so that you never have to free anything and in theory, can never have pdb :: Object memory leaks.
  3. Thou shalt not use multiple inheritance for classes descended from pdb :: Object. The inheritance hierarchy must be a tree. The reason for this is that PDB does some weird stuff (messing with object vTables) to get virtual dispatch to work correctly when you move objects from process to process. This will break if you use multiple inheritance.
  4. Thou shalt not new or malloc a pdb :: Object. Always allocate pdb :: Objects on the heap using the makeObject <> () template function.
  5. Thou shalt not delete or free a pdb :: Object. pdb :: Objects allocated on the heap are fully reference counted, so memory is automatically freed for you when you are done with it.
  6. Thou shalt not define new template classes descending from pdb :: Object. You are, however, free to use template classes provided by pdb (pdb :: Map <>, pdb :: Vector <>, etc.). To allow movement of template classes between processes (and even across machines) PDB templates must utilize a form of type erasure. This is tricky. PDB application programmers should not attempt this!
  7. Before using a newly defined class that descends from pdb :: Object, thou shalt compile the type into a shared library, and register the type with the PDB catalog server. See, for example, Test17.cc. If you don’t do this, you will run into problems when PDB tries to mess with the object’s vTable. See src/sharedLibraries/source/SharedEmployee.cc for an example of how you write code that can be compiled into a shared library (you need to write a C++ code file with one line). Note that you don’t have to follow this commandment if you don’t attempt to use pdb :: Object classes across processes (that is, you are not using distributed PDB and all pdb :: Objects were created by you and not loaded from disk. This is common during debugging and development, for example)
  8. When using a new class that descends from pdb :: Object, thou shalt have a catalog client in thy process, and that client shall have a connection to a running PDB catalog server. See, for example, Test17.cc; it is enough to have a line of code like “pdb :: StorageClient temp (8108, “localhost”, make_shared (“clientLog”));”). The reason is similar to the reason for the last commandment. As with #7: you don’t have to follow this commandment if you don’t attempt to use pdb :: Object classes across processes (that is, you are not using distributed PDB and all pdb :: Objects were created by you and not loaded from disk. This is common during debugging and development, for example).
  9. Before thou callest makeObject <> () to allocate a pdb :: Object on the heap, thou shalt callest makeObjectAllocatorBlock () or have a UseTemporaryAllocationBlock object in scope. This is how you control where the objects that you create are going to be allocated. After you make an “allocation block” all objects created by makeObject <> () will be created within that block. In addition, all assignments (even pointer assignments) to objects will copy them over to the same block. Hence, you can rest assured that if you allocate an object, and then manipulate it, even performing pointer manipulations, the object (and everything it points to) will always be in the same block. This means that such objects can be sent off over the network, with no problem. (As an aside, the last argument to makeObjectAllocatorBlock () should always be true; we need to remove the option of giving a false argument.)
  10. Thou shalt always contain code that manipulates pdb :: Objects within a try {…} catch (NotEnoughSpace &e) {…} block. This exception will be thrown when there is not enough memory to store those pdb :: Objects allocated in the block (as well as a copy of everything reachable by those objects). The only time you would not do this is if you are absolutely sure you have enough memory in the current allocation block, and that you can’t possibly run out.
  11. Thou shalt not have const member variables in a pdb :: Object. Doing so will cause a compilation error because this makes it impossible for the compiler to provide a default copy operator, which the framework makes use of to perform deep copies across allocation blocks. It may be possible to change this in the future, but for now, it’s the eleventh commandment.

Question: What are appropriate uses of pdb :: Objects?

Answer: It depends who you are.

Application programmers should use pdb :: Objects to contain and manipulate data that they want to store in PDB.

PDB systems programmers need to use the pdb :: Object framework to write code that manipulates the pdb :: Objects that application programmers give them. They also use pdb :: Objects to implement communication protocols (using the PDBComminicator class) because it is easy to send complex data structures encoded as pdb :: Objects. In fact, all communication in PDB (so far) has been implemented as sends/receives of pdb :: Objects.

One thing that PDB systems programmers should refrain from doing is using pdb :: Objects to store long-lived data in memory. This is particularly important when implementing ServerFunctionality subclasses. Having member variables that are descended form pdb :: Object should be avoided.

Why?

First off, regular C++ objects work fine for doing this.

Secondly, there is some overhead associated with keeping around pdb :: Objects. The allocation block that is used to store the object is kept around as long as the object is, and having a lot of those can slow things down.

Third, objects are not meant to be used across threads (PDBWorkers). Creating an object (via makeObject) in one thread (that is, within a pdb :: Worker) and then passing a pdb :: Handle to it to another thread who updates that pdb :: Object can cause problems, for various technical reasons. This is exactly what happens, for example, if you have a PDBWorker who receives some data as a pdb :: Object in a ServerFunctionality (perhaps the object is a pdb :: Vector), and then another PDBWorker (at a later time) access that pdb :: Vector and updates it.

Finally, even though the above scenario would be completely safe as long as the second worker didn’t update the pdb :: Vector (it is always safe to share pdb :: Objects across threads, as long as the thread that didn’t create the object does not manipulate it) one could still run into problems when the PDBServer that all of this is happening inside of is taken down. The PDBWorkerQueue stores the allocators responsible for managing pdb :: Object memory. If the PDBWorkerQueue inside the PDBServer is freed before a pdb :: Object that is stored in the ServerFunctionality is freed, when the destructor of the pdb :: Object that is called (this will happen at the time that the ServerFunctionality is taken down) the system will crash.

Again, keep in ming that pdb :: Objects are meant to store/send/receive data. They are not meant as an alternative for std :: vector, std :: map, std :: shared_ptr, etc, for use when writing PDB systems code.

Question: Why does the following crash?

Handle <Foo> myHandle = makeObject <Foo> ();

PDBWorkPtr myWork = make_shared <GenericWork> (
     [&] (PDBBuzzerPtr callterBuzzer) {
       myHandle = nullptr; 
       callerBuzzer->buzz (PDBAlarm :: WorkAllDone);
     });

PDBWorkPtr myWorker = getWorker ();
myWorker->execute (myWork, myBuzzer);

Answer: The fundamental problem with this code is that you have modified a pdb :: Object in a different thread than where it was created, which, as described elsewhere in this FAQ, is not allowed. pdb :: Handle descends from pdb :: Object. Changing it to a nullptr causes all sorts of things to happen, which will break, since the PDB object model (by design) was never meant to function seamlessly cross-thread.

Even the following is very problematic:

Handle <Foo> myHandle = makeObject <Foo> ();

PDBWorkPtr myWork = make_shared <GenericWork> (
     [&] (PDBBuzzerPtr callterBuzzer) {
       getCommunicator ()->sendObject (myHandle, ...); 
       callerBuzzer->buzz (PDBAlarm :: WorkAllDone);
     });

PDBWorkPtr myWorker = getWorker ();
myWorker->execute (myWork, myBuzzer);

The problem here is that the communicator won’t be able to determine the allocation block where the object pointed to by myHandle is located, since it was created on another thread.

So, what’s the solution to this? It’s actually quite simple: Only pass references to pdb :: Objects across threads. That way you can be sure that you are not inadvertently do cross-thread modifications.

If you need to do anything with those references other than simply reading them (this includes sending them across the network, writing them to disk) do a deep copy first. Here’s an example of some safe code:

Handle <Foo> myHandle = makeObject <Foo> ();

// note that this lambda is doing its capture by reference
PDBWorkPtr myWork = make_shared <GenericWork> (
     [&] (PDBBuzzerPtr callterBuzzer) {

       // if I want to send myHandle, be safe, and do a deep copy
       const UseTemporaryAllocationBlock myBlock {1024};
       Handle <Foo> myHandleLocal = makeObject <Foo> ();
       *myHandleLocal = *myHandle; // Deep copy! Safe, and pretty fast, too
       // I can now do whatever I want to myHandleLocal without concern!
       getCommunicator ()->sendObject (myHandleLocal); 
       callerBuzzer->buzz (PDBAlarm :: WorkAllDone);
     });

Question: When do I need to use the GET_V_TABLE () macro to create a shared library for a class derived from pdb :: Object that I intend to load into PDB?

Answer: Currently, there are three PDB container templates: Handle , Map <Key, Value>, and Vector . If you will ever use one of these to “store” a pdb :: Object (or in the case of Handle, to store a reference to a pdb :: Object) then the concrete type of that object must be registered as a shared library with PDB. Don’t worry about abstract classes or classes that will never be stored using such a template.

Question: Why does the following code crash when executed?

class DoStuff {
  void *memory;
  Handle <Vector <Handle <Employee>>> data;
  
  DoStuff () {
    memory = malloc (1024);
    makeObjectAllocationBlock (memory, 1024, true);
    data = makeObject <Vector <Handle <Employee>>> ();
    for (int i = 0; i < 10; i++) {
       Handle <Employee> tempEmp = makeObject <Employee> ();
       data->push_back (tempEmp);
    }
  }

  ~DoStuff () {
    free (memory);
  }
};

int main () {
  DoStuff temp;
}

**Answer: **

Like all reference-based memory management systems, PDB’s object model retains allocated objects, in memory, as long as you have a reachable reference to them, somewhere in your program. The problem here is that the DoStuff class retains a reference to the pdb::Handle data, which itself points to a bunch of objects that reside inside of the memory region pointed to by memory. The destructor for the DoStuff class, when called, frees the memory where the objects referenced by data are located, but data is still out there, pointing to this memory. When the destructor for pdb::Handle is called (automatically, by the C++ compiler, after the free in ~DoStuff ()), the destructor for pdb::Handle very reasonably attempts to clean up all of the objects pointed to by data (no memory leaks!). But those objects are now located in already-freed region of memory, and the code crashes.

The fix is simple:

  ~DoStuff () {
     data = nullptr;
     free (memory);
   }

This ensures that there are no reachable objects inside of memory, before it is freed.

Another fix is:

class DoStuff { Handle <Vector <Handle >> data;

DoStuff () { makeObjectAllocationBlock (1024, true); data = makeObject <Vector <Handle >> (); for (int i = 0; i < 10; i++) { Handle tempEmp = makeObject (); data->push_back (tempEmp); } } };

This works because when you create an object allocation block without specifying the memory location, the object model allocates and manages the RAM for you, deallocating it when it is no longer reachable. Now sometimes this is not acceptable; sometimes you want objects to reside in a specific memory location, and you have to use something that looks like the original code. In that case, you need to be careful; you cannot free the memory location while there are still reachable objects inside of it!

Question : How do I set up a test case that uses distributed vTable fixing? That seems scary!! And it seems like a lot of work.

Answer:

It is not scary! It’s not a lot of work. And it is pretty stable at this point.

And in fact, I would strongly suggest that we not add any more test types into builtInPDBObjects, unless it is clear that this test type will be used repeatedly, all over the place. In general, if you need to have a test that uses some new type, use distributed vTable fixing. It’s easy.

Not to mention, PDB is fundamentally built on this idea. So if you are scared of it or don’t know how to use it, that’s a problem.

Here’s what you need to do:

  1. Add a line to SConstruct that compiles your new type into a shared library (there is already an example of how to do this).
  2. Then start up a catalog server on your machine. This is as easy as running bin/test15. You can keep it running as long as you want. No reason to shut it down.
  3. In your test code, add three lines:
std :: string errMsg;
pdb :: StorageClient temp (8108, "localhost", make_shared <pdb :: PDBLogger> ("clientLog"));
temp.registerType ("libraries/mySharedLibrary.so", errMsg);

That’s as easy as it is. Now, you are using distributed vTable fixing.

Question: I have a graph structure (a DAG) built using pdb::Objects and pdb::Handles, but when I send it across the network, I get strange results after I update it on the other side. Is the object model broken?

Answer:

Likely, the object model has bugs, but that’s probably not why you get strange results when you send a graph across the network. You have to be really careful sending and then editing DAGs made of pdb::Objects and pdb::Handles.

What’s the reason for this? Well, in a nutshell, pdb::Object deep copy implicitly assumes that you are using a functional programming model. That is, it assumes that once you send something across the wire, you won’t update it. In fact, when you send a graph of pdb::Objects and pdb::Handles, the thing that comes out on the other side of the wire is only functionally equivalent to the thing that you sent; it’s not necessary the same. This would be fine if we were using a pure functional language like Haskell, but in C++ it requires some care.

Let me be more specific. Consider the following:

class Node : public Object {
public:
    Handle <Node> left;
    Handle <Node> right;
    String name;
    Node () {}
    Node (char *in) {
        name = in;
    }
    ENABLE_DEEP_COPY;
};

const UseTempObjectAllocationBlock myBlock {1024};
Handle <Node> root = makeObject <Node> ("root");
root->left = makeObject <Node> ("left");
root->right = makeObject <Node> ("right");
root->left->left = root->right->right = makeObject Node> ("sink");

After you send this across the network, what will the following print out?

std :: cout << root->name << "\n";
std :: cout << root->left->name << "\n";
std :: cout << root->right->name << "\n";
std :: cout << root->left->left->name << "\n";
std :: cout << root->right->right->name << "\n";

As expected, you’ll get:

root
left
right
sink
sink

Now let’s say you write the following code:

const UseTempObjectAllocationBlock myOtherBlock {1024};
Handle <Node> newRoot = makeObject <Node> ("newRoot");
newRoot->left = root;

The last line will cause a deep copy because newRoot is on a different allocator from root. Now, you try:

std :: cout << newRoot->left->name << "\n";
std :: cout << newRoot->left->left->name << "\n";
std :: cout << newRoot->left->right->name << "\n";
std :: cout << newRoot->left->left->left->name << "\n";
std :: cout << newRoot->left->right->right->name << "\n";

You will get the same results, as expected.

But now do this:

newRoot->left->left->left->name = "changed";

One would expect that the above print statements would give you:

root
left
right
changed
changed

After all, root->right->right and root->left->left pointed to the same object. But in fact, you’ll get:

root
left
right
changed
sink

Why???

Well, the reason is that the deep copy “treeified” the graph. It traversed the graph using a DFS and replicated it, but when it reached the node labeled “sink” from two different directions, it copied this node twice. So you get a graph that is functionally equivalent (assuming no cycles), but updates to the graph after the deep copy can cause some difficult-to-explain results.

So how to handle this? Here are some suggestions: Avoid complicated pdb::Objects graphs (that is, DAGs and not trees) altogether. Just don’t use them. If this is too restrictive, instead treat all complicated pdb::Objects graphs as purely functional data structures. Once created, a node in a pdb::Object graph DAG is never updated. Then deep copies cannot cause problems. Fans of Haskell will like this. If this is too restrictive, just make sure all nodes are allocated on the same allocator. Then deep copies can never happen.

Question: Can I create object graphs that are cyclic?

That is, can I do:

class Node : public Object {
public:
    Handle <Node> left;
    Handle <Node> right;
    String name;
    Node () {}
    Node (char *in) {
        name = in;
    }
    ENABLE_DEEP_COPY
};

const UseTempObjectAllocationBlock myBlock {1024};
Handle <Node> root = makeObject <Node> ("root");
root->left = makeObject <Node> ("child");
root->left->right = root;

Answer:

No.

It will lead to a crash when such a graph is either deep-copied, or (inevitably) when it is traversed during garbage collection.

Question: When do I use Handle as opposed to just directly using an object of type Foo?

Answer: First, there’s the more basic question of What is a pdb :: Handle? A pdb :: Handle is a pointer-like object, sort of like PDB’s version of a smart pointer. A pdb :: Handle is the only way that you can move pointer-like objects into the cloud using PDB. The problem with regular C pointers or even C++ smart pointers in a system like PDB is that they contain a process-specific memory address that does not translate across machines, or even across processes on the same machine. So they generally don’t work as part of a data structure that you are loading into the cloud.

As to why you will want to use a pdb :: Handle, just like with a regular pointer, you should use it when you need it, but only use it when you need it. Don’t use it when you don’t need it (that is, avoid using it when you don’t have an explicit need that the pdb :: Handle addresses, and definitely don’t use it gratuitously).

So, why should you avoid using pdb :: Handle unless it necessary? pdb :: Handles can be expensive, both in terms of space and compute. Each handle requires 8 bytes for a reference counter that is placed before the referred-to object, plus it requires another 16 bytes in the Handle object itself (which includes an offset, plus information on the type of the referenced object. So each time that you have a Handle as opposed to just directly using an object of type Foo, you are using 24 additional bytes. Further, each time that you dereference a pdb :: Handle the vTable of the referenced object is fixed (if the referenced object is descended from pdb :: Object). This costs CPU cycles. Also, pdb :: Handle objects are reference counted, which can be expensive (though we do plan to add the ability to turn that reference counting off). So there is a computational cost as well.

Why might it make sense to use pdb :: Handle? In summary, pdb :: Handle is the only way to get pointer-like functionality within pdb :: Objects. As such, there are at least six very good reasons to use it:

  • It allows you to build up complex data structures that feature links between objects. You can’t easily build a linked structure of pdb :: Objects that can be moved into the cloud without the pointer-like functionality provided by pdb :: Handle.
  • It allows you to use dynamic dispatch (runtime polymorphism). That is, pdb :: Handles are the only way that you can store pdb :: Objects with virtual methods/functions in the cloud, and have runtime polymorphism work correctly, where calls to virtual methods are redirected to the correct concrete implementation. For example, you can have a pdb :: Vector <Handle >, where Cat has one more virtual methods, and then insert Handle into the vector (assuming Lion descends from Cat). If Lion has its own implementation of one of Cat’s virtual methods, then when you invoke that method via the Lion’s pdb :: Handle, the correct method on Lion will automatically be called, even though what is stored in the vector is Handle .
  • It allows you to write programs that manage data that takes the form of pure virtual base classes (that is, classes that have one or more pure virtual functions). Since you can’t actually create instances of such classes, the only way that you can move them into a cloud is using pdb :: Handles.
  • It allows you to have very low-cost aliasing. You can create multiple Handles to the same object—that is, you can put three Handles referring to the same object into a vector—and the system can avoid a deep copy of that object.
  • A related benefit is that just like a pointer, it gives you a relatively low-cost way to “move” an object. Let’s say that type Foo is very large, or it has an expensive, user-defined copy operation. If I have myVec of type pdb :: Vector <Handle > and hisVec of type pdb :: Vector , then myVec.push_back () can be much less costly than hisVec.push_back ().
  • pdb :: Handle can take the value nullptr. This can be very useful if you are implementing an algorithm that would like to have some notion of an “empty” or unused object.