There is a friendly rivalry between R programmers and SAS programmers within the department of Health Sciences Research at the Mayo Clinic. This results in periodic "anything you can do I can do better" style internal presentations that are in good spirits for the purpose of furthering education. The most recent topic for this presentation was current graphics where myself (SAS programmer) and a coworker (R programmer) challenged each other with non-conventional graphics to try to replicate within our respective programming languages. The challenge graph that most interested myself was the CIRCOS graph which was a type of graph I had not previously heard of. These graphs are commonly used within genomics and are constructed primarily by different types of curved rectangles going around and through the center of a circle. Creating these kinds of shapes is not the current strength of SAS Graphics, and so I had to get creative in order to meet the challenge. I polished my final code into a macro, and am presenting the methods and results at the PharmaSUG 2018 meeting in Seattle!
An internal graphics challenge between SAS and R programmers within Mayo Clinic led to the creation of a CIRCOS plot with the SAS 9.4M3 features. A CIRCOS plot is a circular visualization of the relationship between objects, positions and other time-point related data. The CIRCOS plot features a curved rectangle for each node or object around the perimeter of the circle, and a Bézier curve for each path between nodes that is pulled towards the center of the circle. Each rectangle and curve’s size is determined by their proportion of the total population. The primary challenge of creating this plot with the current SAS SG procedures is that these procedures do not include polar axes to simplify plotting circular or curved objects. The macro %CIRCOS is an example of overcoming these limitations creatively using trigonometry to prove that these types of graphs are still possible without polar axes.
A CIRCOS graph is a visual representation of the flow or relationships between groups. The example below is built by the %CIRCOS macro being presented at PharmaSUG (attached to the article) using the following code:
data random; call streaminit(123); do i = 1 to 100; u = rand("Uniform"); u2 = rand("Uniform"); before = 1 + floor(14*u); after = 1 + floor(14*u2); output; end; drop i u u2; run; %circos(data=random,before=before,after=after,points=100);
There are several components to understand:
The current features in the SAS SG graphics engine do not have many options to create curved shapes. There are methods to make lines, curves, splines, and circles but these would not suffice to make the precisely sized and spaced curved rectangles needed for the CIRCOS plot. The Outer Rectangles could potentially be made with a series plot using spline fitting, and the Inner Rectangles could potentially be made with the same methods, but using line attributes to set the precise rectangle width becomes difficult to provide the space between the two shapes accurately. The Bezier Curves cannot be made this way due to the width of the shape being different all throughout.
The current SAS SG graphics also do not have access to a polar axis which could make plotting circular shapes much easier than Cartesian coordinates by allowing the user to specify arcs and rotation around a center point.
The current SAS SG graphics also do not have an option for a curved axis which is helpful in quantifying the groups.
The CIRCOS macro overcomes the challenge for this graph by making use of trigonometry and the POLYGON plot.
The POLYGON plot is a new type added with SAS 9.4 maintenance 1 that allows the user to provide a dataset of x and y coordinates which are then plotted, connected by lines, and then filled in to create a polygon. An ID variable is provided to allow multiple shapes to be drawn in one POLYGON plot.
The following is a snapshot from the paper:
Table 1 displays an example dataset used for the POLYGON plot, and the figure shows how two separate triangles are drawn thanks to the ID variable. The x and y coordinates are used to draw the outside borders of the triangles which are then filled in.
In order to draw the shapes that I need I reverted back to using trigonometry equations along with the unit circle. The unit circle is a circle of radius 1 centered around the (0,0) coordinate on a Cartesian axis.
The trigonometry functions SINE and COSINE can then be used to find the y and x coordinates for a given rotation around the circle using (1,0) as the starting point.
The %CIRCOS macro uses these trigonometry equations to plot the coordinates for the borders of the rectangles and uses the POLYGON plot to create the shapes. The macro allows the user to determine how many points are used to create each border. Using more points will make the curves look smoother, but will also result in larger file sizes and longer processing time. This again reiterates that the option for a polar axis would be beneficial in this situation. Within a polar axis each edge of the rectangles would only need two coordinates to be plotted. The default number of points the %CIRCOS macro uses is 10 points for each edge for Cartesian coordinates.
The Bézier curves used within the CIRCOS plot are quadratic curves based on three points.
The Starting point is group the path begins at, the ending point is the group the path ends at, and the mid-point is the center of the circle. Because the center of the circle has coordinates (0,0) that entire section of the equation is removed (P1=0). Each of the polygons require two Bézier curves to be drawn (one for the inside and one for the outside of the curve).
The same method is used to draw these curves, but the coordinates are computed from the function B(t) above. The starting points and ending points are computed using the trigonometric functions.
The axes, tick marks, and subgroup dividers are all drawn with Series plots. The coordinates for the axes are drawn the same way as the Outer Rectangles. The coordinates for the tick marks are drawn with two points for each tick mark as are the lines for the subgroup dividers.
The labels are drawn with Text plots. The Text plot allows text to be placed at a particular coordinate and rotated based on a numeric variable. The rotation necessary to face the label towards the center of the circle is calculated within the macro based on the quadrant the text will appear in.
The %CIRCOS macro automatically performs the following calculations:
A final plot dataset is put together containing the coordinates for the polygon plots, the series plots (axis lines and tick marks), and the text plots (labels). A graph template is created using the Graph Template Language and a final plot image is created. The final plot dataset is available to be output.
The plot itself can be modified in the following ways:
The final file containing the plot can be modified in the following ways:
Data originally from a Graphically Speaking written by Sanjay Matange
Sanjay creates his own version of a CIRCOS plot here, and so I used the same data with my macro to create my version of it. The graph shows soccer players leaving teams in one country to enter another, although some players join another team in the same country (see Austria). This example also shows how much improvement rotated text could have visually.
One idea I had for using CIRCOS plots in the clinical oncology setting was to compare values collected at multiple time-points such as quality-of-life survey answers and adverse events. The below example is from a study comparing the maximum grade of a particular adverse event up to 6-weeks to the maximum grade of that same adverse event up to treatment completion. The idea is to see how often adverse events became worse after 6-weeks. The image shows that the population of patients experiencing a grade 3+ adverse event increases post 6-weeks, but a majority of patients don't seem to experience a worse grade for this particular adverse event post 6-weeks.
The image also displays another feature of the %CIRCOS macro: the ability to have subgroups.
This example takes the same data from the previous example, but instead of subgrouping by time-point this pooled analysis is subgrouped by study. This not only shows the population differences between the studies (Study 3 is much larger than Study 6), but gives a quick overview of which study's treatment may have been more toxic.
I will not claim to be the foremost expert on CIRCOS plots or the best ways to use them, but I did really enjoy meeting the challenge of my R using coworkers to create this kind of graph. I believe there is room for SAS graphics to evolve further when it comes to making circular graphs. I am including the macro for anyone who wants to give it a try, and I hope this article and attached paper can help inspire other programmers out there.