C++ is an example of a message-passing paradigm language, which means that objects and values are passed to functions, which then return further objects and values based on the input data. One of the benefits of C++ is that it allows very fine grained control over these function interfaces, as well as how the passed objects are stored and manipulated. In this article I will discuss the different methods of passing data/objects to a function and how this will affect efficiency and operation of your quant algorithms.
In C++ there are three different ways to pass data to a function. They are respectively by value, by reference and by pointer. All have different characteristics when it comes to efficiency, storage and behaviour. We won't dwell on the latter method, passing by pointer, as it is a legacy method used by C-style programs (as well as function pointers). Instead we will concentrate on the first and second methods.
Passing by Value
When an object (or built-in type) is passed by value to a function, the underlying object is copied using its copy constructor. The new object has additional memory allocated to it and all values and sub-objects are copied and stored separately. If the object passed is a built-in type such as an
double value then the copying process is cheap and will often not impact performance. However if the passed object contains a lot of stored values, such as a vector or matrix then the copying process will be expensive in terms of both storage and CPU cycles.
Passing an object by value also means that any modifications made to the object, within the scope of the function being passed the object, will occur on the copied object and not on the object that was passed. This causes confusion for beginning C++ programmers as well as being a source of bugs!
Example: Consider a norm function which calculates the Euclidean norm of a vector of double values. The function returns one double precision value ("the norm") and takes the vector as a parameter. The following code shows the vector being passed by value:
double euclid_norm(vector<double> my_vector);
This will copy the vector and will make any underlying changes to the copy of that vector, rather than the vector itself. A norm function should not modify a vector, only read its data. This implementation is not ideal. It is expensive and the interface does not imply correct usage.
Passing by Reference
When an object (or built-in type) is passed by reference to a function, the underlying object is not copied. The function is given the memory address of the object itself. This saves both memory and CPU cycles as no new memory is allocated and no (expensive) copy constructors are being called. It is a much more efficient operation.
If the function being passed the object now modifies the object in any way, the original object will reflect those modifications, rather than a copy of the object. In some instances this is exactly what is intended. In other situations this may not be the desired behaviour. Once again there is the possibility of bugs! In particular the vector norm function as described above should not be able to modify the passed vector object.
Here is the same
euclid_norm function modified to pass the vector by reference. Note the added reference symbol (&):
double euclid_norm(vector<double>& my_vector);
Passing by Reference to Const
To solve the problem of not copying AND not modifying the underlying data a technique known as passing by reference to const is used. This is similar to passing by reference except that we mark the
my_vector parameter as a
const object. This tells
euclid_norm not to modify
my_vector within its own scope:
double euclid_norm(const vector<double>& my_vector);
Our interface is now much more precise about its intent. The
const keyword marks
my_vector as non-modifiable, while the reference symbol (&) states that it should not be copied. This is exactly the behaviour we would want out of a norm function.
Nearly all instances of mathematical "read only" functions should be prefixed in this manner. Not only is the client interface clearer, but it is highly efficient and will not introduce data overwriting bugs. The only exception to this is when passing built-in types. The copying of such data is inexpensive and thus you can simply pass by value.