Object Serialization

Due Date: Wed, Feb 13th

Object serialization is the process of taking an object written in a particular language, and turning it into data that can be written to a file or to the network. Deserialization is the reverse process: reading data from a file or network and turning it into objects. In particular, with the network, the byte-order of integers and floaking an object written in a particular language, and turning it into data that can be written to a file or to the network.

Goals of the Project

The primary goal of this project is to teach you how to implement object serialization and deserialization. Almost every game engine is written in C++, so it's important to know as a network programmer how to do this in a C++ fashion.

If we think about this problem from an object-oriented design perspective, we can guess that we need to allow objects to be able to read from and write to a byte stream. This technique is necessary because C++ has no simple way to determine all the fields of a class. While the members of a class are guaranteed to be written in the order declared and to live at the top of the memory space they belong too, we can't do a simple memcpy because pointers won't be followed (and instead you'll write the address of where pointers are pointing to to the stream which will obviously be invalid).

To help deal with this problem, we want to use templates because they allow class members to be defined that can be used for any type. Probably the correct object-oriented way to do this is to define an abstract base class as follows:


class Serializable
{
public:
  virtual bool writeTo(Serializer &s) const = 0;
  virtual bool readFrom(Serializer &s) = 0;
};

Note that this technique is similar to what Java does for classes that are serialized. In Java, you inherit the Serializable interface and most of the work is done magically. However, if you override the writeObject and readObject methods, you can manually write and read objects to input and output streams. In your C++ code, you in fact have to call the write method of the passed-in Serializer object for each of your member variables, and of course call read when you're reading--and you have to do this in the same order.

By now you must be wondering what Serializer does. Well, Serializer is a class that writes to and reads from an underlying buffer. It looks like the following:


class Serializer
{
public:
  template <typename T> bool write(const T& obj);
  template <typename T> bool read(T& obj);
};

This class contains what we call template methods. Depending on the compiler, template methods must either be defined in the class itself (which is bad because they could be inlined) or may be external to the class. Because of this, we can define the body of the methods as such in the class body:


template <typename T> bool write(const T& obj)
{
    return SerializerHelper::writeTo(*this, obj);
}

template <typename T> bool read(const T& obj)
{
    return SerializerHelper::readFrom(*this, obj);
}

Now, using the SerializerHelper namespace, which we of course create, we can extend writeTo and readFrom as template functions that handle any type of object. Further, the use of a namespace lets us place these definitions either in a single file or where the object is created.

Note that not every type has to be defined with this method. In fact, it's probably more efficient to simply define the methods for writing the base types within Serializable, instead of having a few levels of indirection before the right function is called. However, for user-defined classes, it's helpful to use this method.

Last, we need to be concerned with the byte orders of the architecture we're working with and with the network. To do the conversions, we use htonl(), htons(), ntohl(), ntohs(). These functions convert long and short integers from the host (i.e., the architecture you're working on) to the network byte order and vice-versa. You can locate information on the on the web or through the man pages.

The point of Serializer is that it should be able to serialize to any kind of buffer, whether it be a file, network stream, or memory. Thus, it makes sense to pass in an object in its constructor that represents a buffer. We then know that whenever we try to read or write it goes to the given buffer:


class Buffer {
public:
  Buffer(size_t size);           // allocate a buffer of size bytes
  Buffer(const Buffer& b);
  virtual ~Buffer();
  
  operator const char *() const; // typecast to a const char *
  operator char *();             // typecast to a char * (to allow you to write to the array)
  
  char *begin();        // returns the beginning of the buffer
  char *end();          // returns a pointer to element n, where n=count()

  size_t size() const;  // maximum size of the buffer
  size_t count() const; // number of elements filled in the buffer

  void write(const char *data, size_t len); // writes data to the buffer
  void read(char *data, size_t len); // reads data from the buffer

  void reset(); // resets the read head to the beginning of the buffer
  void clear(); // erases the buffer
};

Note that even though we're passing in pointers to chars, all memory can be accessed down to the byte level, and thus we can correctly refer to any memory by a char *. So, for example, if you wanted to convert an 8-byte double to network order, you could create a char[8] on the stack, swap the bytes from little-endian to big-endian, and then pass a pointer to this to be written on the buffer. However, to reduce copying, you could get a pointer via end(), and write directly to the buffer. The main difference between the two methods is that by using the write method, you could add some boundary checking code, whereas dealing with the raw pointer would be faster, but could allow you to introduce bugs.

What to Do

In this project, you get the fun of defining and creating Serializable, Serializer, SerializerHelper functions, and the Buffer class that allow one to write the base C++ types, arrays, and objects to a stream. Therefore you must:

Implement the Serializable abstract class.
Implement the Serializer class.
Implement the functions in the SerializerHelper namespace to support Serializer.
Implement the Buffer class.
Test your code to make sure it works! To do this, you need to be able to write your Buffer to a file and read it back to see if you get the correct results back. You'll want to serialize the base types to make sure they work correctly. In addition, you should wrap these in a class and write the object to the buffer.

Grading

Your project will be scored on the correctness of your implementation. I will have a class that will extend your serializable object and call write with the members of my class. I'll read it back and your grade will be graded on how well this works. I will test this on machines with different endian-ness, so, for example, I may create the file on an Intel machine, and read it back on a PowerPC machine. Your objects should serialize and deserialize correctly.

Extra Credit for Undergraduates, Regular Credit for Graduates

Undergraduates can do any of the following for extra credit. Graduates must pick one of these for full credit for their project.

Extend the classes to allow you to handle class heirarchies. As it stands, when you're reading from the stream, you can't tell what class is being read, so you don't know from the data stream which class to construct. By default, I'm assuming you know what to expect over the stream, but let's say that the networking code just deals with the base class and calls writeTo on a networked serializer (technically, a networked buffer). Your networking code will need to figure out what class to construct.
Add the serialization of the STL containers. To do this effectively, you should use iterators. You'll need separate write and read functions for the STL containers, along with SeralizerHelper functions.

References

Jason Beardsley, "Template-Based Object Serialization," Game Programming Gems 3, pp 534--545.