# Sugiyama-Algorithm

Recently I found an interesting article on the sugiyama layouting algorithm on the internet and decided to do some research for fun. Graph visualization with sugiyama is in fact pretty straightforward. The ‘hard part’ is to minimize the crossings between the layers. This is a NP-hard problem but it can be solved pretty good by a hillclimbing or a genetic algorithm. I will explain the algorithm step by step.

## Determining layers

The first step in sugiyama is to create a graph (of course). You must be able to traverse the graph from left-to-right (first node to last node) and vice versa. I did that using three lists: n() represents a single node, np() is a list of all predecessors to that node and nf() is a list of all followers. To explain the algorithm in detail, I created an example diagram:

To traverse the diagram from left to right, simply start at the first node ‘Start’ and recursively visit all its followers (Start → A, Start → B, A → I, A → F, A → C and so on).

As you can imagine from the above diagram, one can assign a virtual ‘layer’ to every node. For example, node ‘Start’ is on layer 0. It is followed by nodes A and B on layer 1:

This, a layer is simply determined by incrementing the predecessor nodes layer by one. But there is a special case – what if a node has predecessors on different layers? For example, take node J: It has three predecessors (G, H, E) on two different layers. In this case, assign the ‘highest’ layer of all its predecessors plus one (node E is on layer 2, node G and H are on layer 3 – so node J becomes layer 3 + 1 = 4).

## Insert fake nodes

After assigning a layer to every node, we insert fake nodes in layers without a node. Without fake nodes, drawing the graph would end up in nodes painted over by lines or lines painted over nodes. A fake node is inserted every time a node has no direct follower in the following layer. Fake node insertion is also done left-to-right.

By now, every node is assigned to a layer and we know all the follower(s) of it. We’re now ready to draw a ‘clean’ diagram without painting over any lines and/or nodes within the diagram (the following diagram was made using the sugiyama algorithm, I apologize for the simple style, I'm a programmer and not an artist ):

By replacing the fake nodes with lines (red) the diagram looks somewhat nicer:

## Minimize crossings

At this point we’ve got a correctly drawn diagram. But the nodes between the layers are in random order, resulting in a lot of line crossings (red):

The next step is to minimize these crossings using an algorithm called *hillclimbing.*

Hillclimbing changes the order of the nodes within a layer until the number of crossings to/from it is minimal. But there is one problem: If you change the node order within one layer (e. g. reduce the crossings between layer three and four), you probably destroy the order of the predecessor layer (between two and three). So the main goal of our crossing minimization algorithm is to reduce the overall crossings in a diagram and not just the crossings betweens certain layers. Since this cannot be done in linear time (as far as I know) this is called a *NP-hard problem.*

In my program I did crossing minimization with a hillclimbing algorithm: Randomly rearrange the nodes in one layer, undo if this increments the total number of crossings in the diagram. The following (final) diagram was optimized with hillclimbing:

Hillclimbing has another major drawback: It sometimes gets stuck in a local minima: The crossings between two certain layers are minimized, but it still exists a better overall solution. See for example the following diagram (I did a lot of them for testing purposes):

In this example the crossings between layer 6 and 7 (T14 → T15) are optimal (crossing count between these layers is zero!). The algorithm did not find a better solution since it is allowed to change only the nodes of one layer per iteration. Rearranging node T14 (T9 → T14) would end up in a crossing on the following layers (nodes T15/T16). The result is a local minima.

It is clear – from the above explanation – that a genetic algorithm would do far better (create a population of n different diagrams, pick the best ones and optimize them further). The problem of a local minima is disabled by allowing the algorithm to change the nodes of all layers at the same time. This – in most cases – produces a perfect or nearly perfect result (and is faster as well). In our former example, the genetic algorithm would rearrange nodes T14, T15 and T16 at the same time – this leads to a perfect solution:

## Calculate crossing count (edges)

In next part of this tutorial, I'll go a little deeper into the details of crossing minimization. Calculating the crossings in a diagram is straightforward. It is done by counting the edge crossings between two adjacent layers and adding up the results. For example, in a four layer diagram, count the crossings between the layers 0 – 1, 1 – 2, 2 – 3. For example, let’s count the edge crossings in the following diagram between layers 2 – 3 (red):

Obviously, there are five connections between both layers (as always, I assume working from left to right): 3 to 13, 4 to 14, 5 to 8, 6 to 9 and 7 to 15. But how do we check if these five connections cross? If you think about it, you can break down the problem to four simple cases:

a1/a2/b1/b2 are the position of a node within a layer (think of a layer as a list of nodes!), for example node 3 has pos. 0 in the first layer and node 13 has pos. 2 in the second layer):

Using the four rules above, we can check the crossings for the first connection (node 3 to 13):

- a1 = 0 (node 3 in 1st layer), a2 = 2 (node 13 in 2nd layer)
- b1 = 1, b2 = 3 (node 4 to 14) → false
- b1 = 2, b2 = 0 (node 5 to 8) → true (crossing)
- b1 = 3, b2 = 1 (node 6 to 9) → true (crossing)
- b1 = 4, b2 = 4 (node 7 to 15) → false

Thinking further, you can shorten these four rules into two rules using a simple function (which returns true in case of a crossing between a1/a2 with b1/b2), in pseudocode:

bool linesCrossing(int a1, int a2, int b1, int b2) { return ((a1 < b1 And a2 > b2) Or (a1 > b1 And a2 < b2)) }

When checking the second connection (nodes 4 to 14), you can leave out the first connection, since we already counted its edge crossings with the other connections. In my homegrown pseudocode, this looks like:

int crossingCount(layer1, layer2) { define two int arrays l1[], l2[] visit every node in first layer, fill arrays with connections (a1/a2/b1/b2): l1[0] = 0, l2[0] = 2 l1[1] = 1, l2[1] = 3 l1[2] = 2, l2[2] = 0 l1[3] = 3, l2[3] = 1 l1[4] = 4, l2[4] = 4 connectionsBetweenLayers = 0; for (a = 0; sizeof(l1) - 1; a++) { for (b = a + 1; sizeof(l1) - 1; b++) { if (linesCrossing(l1[a], l2[a], l1[b], l2[b])) connectionsBetweenLayers++; } } return connectionsBetweenLayers; }

This returns the crossings between two layers. Do this for all layers and voila!

This is the last in the sugiyama algorithm series. Putting it all together, I made up a simple visualization program and recorded it with a screen recorder. It shows the sugiyama algorithm with a medium-sized precedence graph during the optimization phase. The program uses a simple hillclimbing algorithm, which is by far not the optimal solution.

I apologize for the bad resolution: