Here’s a different approach for dealing with trees. It works well in cases where nested sets and traditional trees won’t. Specifically, if you have a tree with lots of new nodes, both leaf and non-leaf, and you frequently need to get all descendants, this approach may be exactly what you are looking for. Persistence frameworks like Active Record allow you to express complex relationships without a whole lot of extra code. Sometimes, there’s basically one way to get what you want. With trees, the answer is not always so simple. You may consider repeatedly banging your head into one when you think about the tradeoffs involved.
- You can establish a relationship by expressing a parent_id with [acts as tree](http://railsmanual.org/module/ActiveRecord::Acts::Tree::ClassMethods/acts_as_tree).
- You can establish a relationship using the [nested set](http://dev.mysql.com/tech-resources/articles/hierarchical-data.html) model with acts as nested set.
- You can roll your own.
acts_as_tree
====
Acts as tree is really shorthand for building a one-to-many relationship between an active record (as parent) and itself (as children). You also specify the corresponding many-to-one relationship between a child and its corresponding parent. In your class, you would specify only acts as tree:
In your migration or schema, you’d have to add a column called parent_id that’s just a foreign key into your parent. In your models, to get children, you could say
root = Tree.find_by_parent_id(nil) root.childrenGetting all ancestors is a little tougher. You’d have to recurse. On the Active Record class, to get a node plus all descendants you’d add something like this:
class Tree < ActiveRecord::Base acts_as_tree def descendants (self.children.each {|child| child.descendants}.flatten + [self]) end endTradeoffs
————-
Let’s tackle the plus side first. The model is easy to understand, and relatively easy to code. Getting the children for any individual node is relatively efficient: you just need to do one join. Adding a child is inexpensive: just one insert.
The problem is that such a method is expensive to process. You’ll have to add a join for each non-leaf node in the tree. Deleting a subtree is relatively expensive for the same reasons because you need to find all descendants first.
The core problem is that relational databases may express hierarchies well, but without extensions, processing them is expensive. If you want to deal with nested sets, such as all products in a category, you’d often be better off with a nested set. Here’s how it works.
Nested Sets
===
A nested set is fundamentally a tree expressed as a list. You do a depth-first traversal of a tree, recursively numbering nodes on your way in, and on your way out. Depth-first means for a given node, you visit children before siblings. Since you number each node twice (for in and out), each node has two values we’ll call left and right.

Another way to get the numbers is simply to draw a nested set, and number all of the boundaries between the elements, like [you see here.](http://dev.mysql.com/tech-resources/articles/hierarchical-data.html)
It turns out that Rails already has an acts_as_nested_set relationship predefined:
class Tree < ActiveRecord::Base acts_as_nested_set endIn your schema or migration, you need to add parent_id, lft, and rgt for the id of the parent node, left, and right values respectively.
Tradeoffs
————-
It turns out that you can get very good performance by using single queries with the left and right values. To get all descendants, you can just query for all nodes having values between left and right. To get leaf nodes you just look for nodes where left + 1 = right. Count descendants with (left + right)/2 + 1. Other queries can get you aggregated counts, depth, and so on. So read performance is very good.
The first down side is the clarity of the model. The lft and rgt values are not too descriptive. You simply have to know how nested sets work to understand them.
The other big negative is update performance: any insert, delete, or move will force you to lock the table to renumber the whole tree. If your categories are relatively stable, that’s not a big deal. If you are working with a large volatile tree, you’re not going to like acts-as-nested-set nearly as much.
A compromise: Dotted ids
========
Here’s an approach I thought I’d share with you based on some work with a local vendor. The ideas aren’t new, but the implementation is proprietary, so I can’t share all of the code with you.
But the premise is that you can keep a regular id, which is an integer, best implemented with a sequence. You also keep a dotted id, which concatenates the ids of all of the ancestors together. You must append a trailing dot for reasons that will become clear later. So a parent would have a dotted-id of 1., and its first child would have a dotted-id of 1.2. You can actually automate the assignment of a dotted id with a after-create filter:
after_create :assign_dotted_id def assign_dotted_id self.dotted_id = (self.parent.dotted_id rescule ‘’) + self.id.to_s + ’.’ save endTradeoffs
————-
The nice thing about this approach is that you can find all descendants very easily, with
You can also use the existing parent id with acts-as-tree with no penalty. Creating a new node is slightly more expensive, with three or four queries and one more save, but it’s not that big a deal. Getting the depth is as easy as counting dots in the dotted-id.
Here’s the down side. Moving a whole subtree is very expensive. You have to update the dotted addresses in all of the sub-nodes. And you can’t do it with a single query (at least, not with all databases,) because the parent of each node must be already set before you assign the dotted address. With a little imagination, you don’t have to do a whole recursive update. But by far, the biggest downside of this approach is moving subtrees.
But even if you have a huge tree with lots of new nodes, you can get some of the benefits of a nested set without having to pay such a big penalty for new nodes. I like this approach. Do you think it’s worth submitting a patch?