BookmarkSubscribeRSS Feed

%CIRCOS: A SAS Macro to Create CIRCOS Plots

Started ‎04-27-2018 by
Modified ‎05-04-2018 by
Views 5,071

There is a friendly rivalry between R programmers and SAS programmers within the department of Health Sciences Research at the Mayo Clinic.  This results in periodic "anything you can do I can do better" style internal presentations that are in good spirits for the purpose of furthering education.  The most recent topic for this presentation was current graphics where myself (SAS programmer) and a coworker (R programmer) challenged each other with non-conventional graphics to try to replicate within our respective programming languages.  The challenge graph that most interested myself was the CIRCOS graph which was a type of graph I had not previously heard of.  These graphs are commonly used within genomics and are constructed primarily by different types of curved rectangles going around and through the center of a circle.  Creating these kinds of shapes is not the current strength of SAS Graphics, and so I had to get creative in order to meet the challenge.  I polished my final code into a macro, and am presenting the methods and results at the PharmaSUG 2018 meeting in Seattle!

 

Abstract

An internal graphics challenge between SAS and R programmers within Mayo Clinic led to the creation of a CIRCOS plot with the SAS 9.4M3 features.  A CIRCOS plot is a circular visualization of the relationship between objects, positions and other time-point related data.  The CIRCOS plot features a curved rectangle for each node or object around the perimeter of the circle, and a Bézier curve for each path between nodes that is pulled towards the center of the circle. Each rectangle and curve’s size is determined by their proportion of the total population.  The primary challenge of creating this plot with the current SAS SG procedures is that these procedures do not include polar axes to simplify plotting circular or curved objects.  The macro %CIRCOS is an example of overcoming these limitations creatively using trigonometry to prove that these types of graphs are still possible without polar axes.

 

What is a CIRCOS Graph?

A CIRCOS graph is a visual representation of the flow or relationships between groups. The example below is built by the %CIRCOS macro being presented at PharmaSUG (attached to the article) using the following code:

 

 data random;
     call streaminit(123);
     do i = 1 to 100;
         u = rand("Uniform");
         u2 = rand("Uniform");
         before = 1 + floor(14*u);
         after = 1 + floor(14*u2);
         output;
     end;
     drop i u u2;
 run;

%circos(data=random,before=before,after=after,points=100);

 

example1_100.png

There are several components to understand:

  • A Path: An observation (could be a patient) starting in one group and leaving to another group.  The starting and ending groups can be the same.
  • The Population: The total population consists of all the paths leaving a group and all of the paths entering a group.  In the above example there are 100 paths leaving groups and 100 paths entering groups making the total population 200.
  • Group: A group is the starting or ending point around the circle.
  • Outer Rectangle (OR): The outermost rectangles for each group represent that group's proportion of the full population.  For example, the rectangle for Group 3 is approximately twice as long as Group 4 which would indicate the population is approximately twice as much.
  • Inner Rectangle (IR): The rectangle just inside the OR represents what proportion of that group's population is leaving for another group.  For example, Group 5's IR is approximately half the length of its OR which would indicate approximately half of its population is going to another group and half of its population is coming from another group.
  • Labels and Axis: Each group has its own label that faces the center of the circle and its own axis where each tick mark represents one percent of the total population.  Each fifth tick mark is twice as long.  Using Group 5 as an example, there are approximately 8.5 ticks which indicates that the total population of paths leaving Group 5 and entering Group 5 is 8.5% of the total population of paths.
  • Bézier Curves (BC): These are the curves that show the paths connecting one group to another inside the circle.  The widths at the start and end of each curve is proportionate to the number of paths leaving the starting group and entering the ending group, and as such the starting and ending widths are generally different.  Each curve is pulled more towards the center of the circle depending on how far away the starting and ending groups are.

What Made This Difficult?

The current features in the SAS SG graphics engine do not have many options to create curved shapes.  There are methods to make lines, curves, splines, and circles but these would not suffice to make the precisely sized and spaced curved rectangles needed for the CIRCOS plot.  The Outer Rectangles could potentially be made with a series plot using spline fitting, and the Inner Rectangles could potentially be made with the same methods, but using line attributes to set the precise rectangle width becomes difficult to provide the space between the two shapes accurately.  The Bezier Curves cannot be made this way due to the width of the shape being different all throughout.

The current SAS SG graphics also do not have access to a polar axis which could make plotting circular shapes much easier than Cartesian coordinates by allowing the user to specify arcs  and rotation around a center point.

The current SAS SG graphics also do not have an option for a curved axis which is helpful  in quantifying the groups.

 

General Methods

 The CIRCOS macro overcomes the challenge for this graph by making use of trigonometry and the POLYGON plot. 

The POLYGON Plot

The POLYGON plot is a new type added with SAS 9.4 maintenance 1 that allows the user to provide a dataset of x and y coordinates which are then plotted, connected by lines, and then filled in to create a polygon.    An ID variable is provided to allow multiple shapes to be drawn in one POLYGON plot.

The following is a snapshot from the paper:

polygon_example.png

Table 1 displays an example dataset used for the POLYGON plot, and the figure shows how two separate triangles are drawn thanks to the ID variable.  The x and y coordinates are used to draw the outside borders of the triangles which are then filled in.

Trigonometry

In order to draw the shapes that I need I reverted back to using trigonometry equations along with the unit circle.  The unit circle is a circle of radius 1 centered around the (0,0) coordinate on a Cartesian axis. 

unit_circle.png

The trigonometry functions SINE and COSINE can then be used to find the y and x coordinates for a given rotation around the circle using (1,0) as the starting point.

  • θ: Rotation in degrees or radians counter-clockwise from the positive x-axis.
  • 😧 Distance from the center of the circle (0,0) that the arc is drawn.
  • x coordinate: calculated from the equation x = D * COS(θ)
  • y coordinate: calculated from the equation y = D * SIN(θ)
  • NOTE: the sin and cos SAS functions use radians as the unit of measure.

The %CIRCOS macro uses these trigonometry equations to plot the coordinates for the borders of the rectangles and uses the POLYGON plot to create the shapes.  The macro allows the user to determine how many points are used to create each border.   Using more points will make the curves look smoother, but will also result in larger file sizes and longer processing time.  This again reiterates that the option for a polar axis would be beneficial in this situation.  Within a polar axis each edge of the rectangles would only need two coordinates to be plotted.  The default number of points the %CIRCOS macro uses is 10 points for each edge for Cartesian coordinates.

Bézier Curve

The Bézier curves used within the CIRCOS plot are quadratic curves based on three points.

bezier_formula.png

  • P0: Starting point
  • P1: Mid-point
  • P2: End-point
  • t: represents the point between the start and end of the curve as a proportion

The Starting point is group the path begins at, the ending point is the group the path ends at, and the mid-point is the center of the circle.  Because the center of the circle has coordinates (0,0) that entire section of the equation is removed (P1=0).  Each of the polygons require two Bézier curves to be drawn (one for the inside and one for the outside of the curve).

The same method is used to draw these curves, but the coordinates are computed from the function B(t) above.  The starting points and ending points are computed using the trigonometric functions.

Series Plot

The axes, tick marks, and subgroup dividers are all drawn with Series plots.  The coordinates for the axes are drawn the same way as the Outer Rectangles.  The coordinates for the tick marks are drawn with two points for each tick mark as are the lines for the subgroup dividers.

Text Plot

The labels are drawn with Text plots.  The Text plot allows text to be placed at a particular coordinate and rotated based on a numeric variable.  The rotation necessary to face the label towards the center of the circle is calculated within the macro based on the quadrant the text will appear in.

 

What Can the Macro Do?

Automation

The %CIRCOS macro automatically performs the following calculations:

  • Total population and the proportion of each group
  • Frequency and proportions of each path between groups
  • Starting and ending positions for each component of the CIRCOS plot
  • Location and rotation of the tick marks for each axis for each group
  • Location and rotation of each of the labels for each group
  • The best order to draw the Bézier curves to connect closer groups first
  • Perform these calculations within subgroups

A final plot dataset is put together containing the coordinates for the polygon plots, the series plots (axis lines and tick marks), and the text plots (labels).  A graph template is created using the Graph Template Language and a final plot image is created.  The final plot dataset is available to be output.

Appearance Modifications

The plot itself can be modified in the following ways:

  • Color of each group
  • Order of the groups
  • Starting point for the first group around the circle (Default is 270 degrees or 3pi/2 radians)
  • Widths of the Outer and Inner Rectangles
  • Gaps between Outer and Inner Rectangles
  • Gap between Groups
  • Direction of rotation (clockwise or counter-clockwise)
  • Number of points used to draw each edge (essentially resolution of the curves)
  • Axis color
  • Background color
  • Font Color

File Modifications

The final file containing the plot can be modified in the following ways:

  • Height and Width
  • Anti-aliasing
  • Background transparency
  • Image border
  • Image type and DPI
  • Image location

Examples

Soccer Players Changing Teams

Data originally from a Graphically Speaking written by Sanjay Matange

10-27-2016 Graphically Speaking

Sanjay creates his own version of a CIRCOS plot here, and so I used the same data with my macro to create my version of it.  The graph shows soccer players leaving teams in one country to enter another, although some players join another team in the same country (see Austria).  This example also shows how much improvement rotated text could have visually.

soccer.png

 

Comparing AE Grades Between Time-Points

One idea I had for using CIRCOS plots in the clinical oncology setting was to compare values collected at multiple time-points such as quality-of-life survey answers and adverse events.  The below example is from a study comparing the maximum grade of a particular adverse event up to 6-weeks to the maximum grade of that same adverse event up to treatment completion.  The idea is to see how often adverse events became worse after 6-weeks.  The image shows that the population of patients experiencing a grade 3+ adverse event increases post 6-weeks, but a majority of patients don't seem to experience a worse grade for this particular adverse event post 6-weeks.

The image also displays another feature of the %CIRCOS macro: the ability to have subgroups. 

ae1.png

 

Comparing AE Grades Between Time-Points by Study

This example takes the same data from the previous example, but instead of subgrouping by time-point this pooled analysis is subgrouped by study.  This not only shows the population differences between the studies (Study 3 is much larger than Study 6), but gives a quick overview of which study's treatment may have been more toxic.

ae2.png

Conclusion

I will not claim to be the foremost expert on CIRCOS plots or the best ways to use them, but I did really enjoy meeting the challenge of my R using coworkers to create this kind of graph.  I believe there is room for SAS graphics to evolve further when it comes to making circular graphs.  I am including the macro for anyone who wants to give it a try, and I hope this article and attached paper can help inspire other programmers out there.

Comments

 

Hi Jeff, in addition to current input dataset which has two variables of BEFORE and AFTER, it would be convenienct for user to creat the circle graph by adding a third variable like LinkCount in Sanjay's example.If the third variable is missing, then assume it is 1 by default.

data soccer;
  input From $1-10 To $11-20 LinkCount;
  datalines;
USA       Spain       2
USA       Germany     3
England   Germany     2
England   Italy       3
England   Norway      1
England   Spain       2
England   Belgium     2
England   Brazil      1
France    Belgium     1
Italy     Argentina   1
Argentina Argentina   9
Austria   Austria     6
Belgium   Belgium     8
Brazil    Brazil      8
Bulgaria  Bulgaria    7
Denmark   Denmark     8
England   England     6
France    France      9
Germany   Germany    12
Greece    Greece      6
Italy     Italy       6
Norway    Norway      6
Spain     Spain       8
;
run;

 

 

Hi, anyone knows how to perform the 2nd case (comparing AE grades between time points). What kind of data structure does it require?

Hello @Arisq You would need a before group variable (AE grade 6 week), an after group variable (AE grade all time), a before subgroup (6 weeks) variable, and an after subgroup (All times) variable.  Below is the code I used to make the graph.  The before_group and after_group are what separate the beginning and ending points of the curves (they group the before and after variables into groups).

 

data ae;
    set ae.final;
    where nmiss(diarrhea_6week,diarrhea)=0;
    grp1='6-weeks';grp2='End of Treatment';
run;

%circos(data=ae,before=diarrhea_6week, after=diarrhea, before_group=grp1, after_group=grp2, border=0);

Thank you for your help, Jeff. 😄

Version history
Last update:
‎05-04-2018 11:45 AM
Updated by:
Contributors

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

Free course: Data Literacy Essentials

Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning  and boost your career prospects.

Get Started

Article Tags