# A quadtree in C++
Quad trees are search trees for objects arranged in a two dimensional coordinate system. The nodes in the tree can host elements up to a particular capacity, once the capacity limit is reached the node splits into five nodes (the original one plus four children). Wikipedia has a nice article covering this data structure.
# Implementation and usage
A tree carries entries (Entry<T>
) which accept a point (spatial information) and a payload of an arbitrary type. This way, the tree can carry arbitrary obejcts, like measurements which a spatial reference, vehicle objects and so on.
To create a tree, you need to create a Node
:
const auto root = std::make_unique<Node<double>>(0, 100, 0, 100, 1000);
The parameters are the bounding box of the tree and the the capacity for each node. Lowering the capacity of the nodes increases the size of the tree while increasing the capacity takes away advantages of the spatial indexing. The payload type is required here as well, in this example it's a double
. Now you can start inserting values and making queries:
root->insert(Entry<double>{
3.14, // x
1.41, // y
0. // the payload
});
const auto points_in_range = root->query(x0, x1, y0, y1);
I decided to implement the split mechanism like this:
- When there is capacity left, the node will get an additional child leaf.
- When there is no capacity left, the node will receive four child nodes and the leaf will be inserted in one of them.
- So in particular: when a split happens, the existing children will not be re-distributed across the children. So a node does not host either leafs or child nodes, instead it will host both child types.
This is in no way incorrect since my implementation checks each point before adding it to the response, but could take away some of the query performance in favour of construction performance. I did not yet examine this, I only want to make this decision explicit in case you wonder.
# Profiling
# A simple timer
A simple time ./quadtree
indicates that the performance is rather bad. To find out more, I put this little timer in place:
struct Timer
{
const std::chrono::steady_clock::time_point begin;
const std::string what;
Timer(const std::string &what) : begin{std::chrono::steady_clock::now()}, what{what}
{
}
~Timer()
{
const auto end = std::chrono::steady_clock::now();
auto ms = std::chrono::duration_cast<std::chrono::milliseconds>(end - this->begin).count();
std::cout << this->what << " took: " << ms << " ms" << std::endl;
}
};
I use the fact that the destructor is called once the timer goes out of scope. Now everything I have to do is to put a set of additional {}
around the block I want to profile, plus instantiate the struct at the top of this block. Like so:
do_bar();
{
Timer t("running this block");
do_foo();
}
do_ham();
The output of a particular version was like:
construction took: 8397 ms
6028
query tree took: 0 ms
6028
query by iterating the list took: 128 ms
By increasing the region the number for the tree search goes up and easily exceeds the number for the list search, but this is in no way unexpected since it short-circuits the spatial index and in the worst case leads to checking each and every point plus the overhead for the tree.
# gprof
In my last job I used valgrind
(cachegrind
in particular) to profile calls in my code, but recently discovered gprof
which I try to get used to. In a nutshell,
- compile the code with
-g -pg
flags (GCC and clang at least), - run the application (which magically now spits out a file
gmon.out
) and then - run
gperf ./quadtree gmon.out | less
to view the call statistics.
Here is the GNU documentation.
In this example, the output looks like (excerpt):
Flat profile:
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls ms/call ms/call name
15.16 0.94 0.94 281131684 0.00 0.00 overlap(double const&, double const&, double const&, double const&)
11.51 1.65 0.71 166818859 0.00 0.00 Node::can_host(Entry const&)
8.59 2.18 0.53 166818859 0.00 0.00 Node::insert(Entry const&)
6.00 2.55 0.37 73608585 0.00 0.00 std::vector<Entry, std::allocator<Entry> >::size() const
Of course the output of the profiler does not distinguish between tree construction and tree traversal (querying). It may completely be okay for the construction to be expensive as long as the traversal stays performant. On the other hand the table above indicates that calls to overlap
are in charge for a certain portion of the run time, and this is called both during construction and traversal.