BookmarkSubscribeRSS Feed

How Can I Create Graphs Using SAS®? Q&A, Slides, and On-Demand Recording

Started ‎03-17-2021 by
Modified ‎09-17-2021 by
Views 6,333

Graphs.jpg

Learn to harness the power of SAS to create meaningful graphs using Statistical Graphics procedures and Graph Template Language (GTL). 

 

Watch the webinar

 

SAS customers Richann Watson and Kriss Harris share their expertise in creating SAS graphs and show you how to customize graphs for your audience. During this webinar you will learn:

  • How to use Statistical Graphics procedures.
  • Ways to adjust your graph font, color, and more.
  • Ways to use ODS output objects to get data from another SAS procedure.
  • Techniques using Graph Template Language (GTL) to create a custom graph.

The questions from the Q&A segment held at the end of the webinar are listed below and the slides from the webinar are attached.

 

Will you cover how to get the resolution needed for publication?

Higher resolution graph output PNG files can be obtained by using the LISTING destination and setting a high DPI such as 300 or 600 using the IMAGE_DPI option. 

ods _all_ close;

%let gpath=C:\;

%let w=5in;

%let h=3in;

%let dpi=200;

ods listing gpath="&gpath" image_dpi=&dpi;

ods graphics / reset width=&w height=&h;

 

Thank you for breaking the concept of GTL into bite-sized chunks, but how much of a time investment is it to learn GTL for a novice who has only used basic SG procedures?

Both SG procedures and GTL have their advantages.  A lot can still be done in SG procedures, however, things that are truly custom or graphs that may not be compatible in SG procedures would require the use of GTL. 

 

We believe you would need approximately 35 hours to reach a novice level in GTL. You can also learn more about GTL by using the TMPLOUT option in SGPLOT to see how you can go from SGPLOT to GTL. This is because the TMPLOUT option will provide you with the (statgraph) template, that is, the GTL code that you can use to create the same plot as you would have made with SGPLOT. And over time, you could see the effect that changing various plot statements and options in your code has on the plot.

 

Will that form of TITLE statement work in a Macro? I know that title '#byval(xxx)' doesn't work?

I only use #byval when I’m doing multiple variable processing. If you are creating a graph using a macro, then you can send a parameter that can populate the title.

 

Do the whiskers look different to suit the client's request? (Changing the calculation, as you mentioned?)

In the box plot statement, there is the concept of “fences”. The fences are the location above and below the box. The upper and lower fences are 1.5 times the distance of the interquartile range (IQR = Q3 – Q1) and the upper far and lower far fences are 3 times the distance of the IQR.  Typically, the whiskers on the box plot are drawn from the edge of the IQR to the largest value within the fences. But if you want the whiskers to display to the extreme values even if they are outside the fences, then option EXTREME needs to be specified. If you want the whiskers drawn to a specific percentile, then you can use WHISKERPCT = number, where number is the percentile. Note that using this option will change what is considered an outlier.  The concept of “fences” will not apply.

 

How do you create the blue circles for the N_count within the Boxplot?

The blue circles within the box plots are produced by overlaying a scatter plot of the individual results.  Using the JITTER option on the scatter statement allows the circles to be offset so they are not displayed on top of each other.

 

Can either display ranges on X axis or Y axis change ‘on demand’ depending on the read values? For example, if one customer has 100K members and the other has 10K members; I would like to show values displayed by 20K for a customer with 100k members, but I would like to show values displayed by 1k for a customer with 10k members? Could I determine such ‘by’ value ‘on the fly’ (calculate it and be dependable on the size of the customer) as opposed to have it constant/static? If so, how can this be done under proc sgplot statement?

To specify a specific set of values to display on the X or Y axis, you would need to use the VALUES = (value-list) option on the axis statement.  The value-list can be specific values (e.g., VALUES = (2 4 10 20)) or it can be in the form of start-value to end-value by increment (e.g., VALUES = (1 to 100 by 20). In order to change these values ‘on the fly’ so that they differ based on the data for each parameter, you would need to generate your graph in a macro where you would need to have the start, end and increment value for each parameter.

 

In Vbox, we specified the legend with n. max, min etc. But then we created values in char to use as the legend info. Did we have to do both steps? To me it seems that we are not using the first specification that we did?

In the webinar, we illustrated that we could get close to the desired graph using the basic DISPLAYSTATS = (xxx) option.  If the default display for summary statistics is acceptable, then it is not necessary to do the secondary step.  The secondary step was meant to show you how to modify the table at the bottom.  If the summary statistics has to be in a specific format, then we have found it easier to do the additional step to store the statistics in the data set in the desired format and use an AXISTABLE statement to display them.

 

What characters identify a whisker on the box plot?

The lines are what identify the whisker and they can make them as line as they want and they can change the line pattern.  You can use the WHISKATTRS = (options) option to specify the line attributes and patterns that you would like to use for your whiskers.

 

You might have mentioned it already, but how far do the default whiskers of the boxplot extend: 3 times the IQR? or 1.5 times IQR?

In the box plot statement, there is the concept of “fences”. The fences are the location above and below the box. The upper and lower fences are 1.5 times the distance of the interquartile range (IQR = Q3 – Q1) and the upper far and lower far fences are 3 times the distance of the IQR.  Typically, the whiskers on the box plot are drawn from the edge of the IQR to the largest value within the fences. But if you want the whiskers to display to the extreme values even if they are outside the fences, then option EXTREME needs to be specified.

 

Normally when creating the boxplot there are not blue circles behind the actual box. How did you create them?

The blue circles within the box plots are produced by overlaying a scatter plot of the individual results.  Using the JITTER option on the scatter statement allows the circles to be offset so they are not displayed on top of each other.

 

Is there a way to get a superscript into an axis label, e.g. kg/m2 with a superscripted '2'? I often derive a coefficient that needs to be superscripted.

This resource should help you.  Note that for the axis label you need to use Unicode character.

https://communities.sas.com/t5/Graphics-Programming/How-to-get-the-subscript-and-superscript-in-proc...

 

Hello, I have a question on the proc gchart procedure to create a bar graph with a continuous variable on the x axis that shows cumulative percent on the y axis. My continuous variable has a big range. How can I control the x axis? I am currently using midpoints to specify the x axis. Is there any other way to do this?

GCHART is part of SAS/GRAPHS and not part of ODS Graphics.  SAS/GRAPHS is an add on and is typically only used by those that have the add-on. Our expertise is focused on using ODS Graphics. Perhaps you could create a HISTOGRAM or BARCHART for this data.

 

The star outlier (green star) should be in another color. By using the green color as the mean and the outlier, it's showing as if there is a relation when it is not... just a thought 🙂

The colors had no meaning in the webinar.  It was meant for illustration purposes only.  The purpose was to show that you can change the symbol, size and color based on specific needs.  The colors, symbols, line patterns and size have no meaning in the graph and is only meant to illustrate concepts.

 

What would the output look like if you didn't know to use ods trace on?

The default results in the LOG will be shown, so the log would look the same as usual. If you are referring to the SURVIVALPLOT that was created, then that would be the same. ODS TRACE ON, only writes the ODS objects that were used in the procedure to the log.

 

To show the cumulative percentage on the Y axis, what does the first midpoint of 0 represent? How are the cumulative frequencies calculated at each midpoint? At the midpoint of 0, I get a very high cumulative percentage, which is not consistent with my data, so it would be helpful to know what the midpoints represent.

The graphs shown were Kaplan-Meier graphs. Chapter 5 in our book goes into detail regarding the probability calculations.

 

The part survival (atrisk=0 to 210 by 30), is it random numbers or they are chosen from the data?

These are chosen from the data, because we wanted to indicate the patients at risk at each month, and so that’s why we used by 30, because the units were in days.

 

Using the Graph Wizard in SAS, will this produce the same results?

There is no graph wizard in PC-SAS that will produce similar graphs. The SAS Enterprise Guide (EG) menu can create graphs, but the graphs will not be the same as the ones that were produced in the webinar because the graph menus in SAS EG use GPLOT, whereas the graphs in the webinar used SGPLOT, TEMPLATE and the SGRENDER procedures.

 

Is "kmtemplate" the name for the statgraph template that you use in your sg proc?

In this example yes, Kriss created "kmtemplate" as his graphics template.

 

Where can one get the define statgraph notes? I have always wondered how to begin to define the structure?

Here is the SAS documentation and if you scroll down it has a beginning template like Kriss is using. https://go.documentation.sas.com/?cdcId=pgmsascdc&cdcVersion=9.4_3.5&docsetId=grstatgraph&docsetTarg...

 

Is the PROC TEMPLATE always needed for Survival Plots when using this procedure?

No, because you can also use SGPLOT. For example:

title1 'Product-Limit Survival Estimates';

title2 'With Number of Patients at-Risk';

proc sgplot data=SurvivalPlot;

   step x = time y = survival / group = stratum name = 'survival';

   scatter x = time y = censored / markerattrs = (symbol = plus color = black) name = 'censored';

   scatter x = time y = censored / markerattrs = (symbol = plus) group = stratum;

 

   xaxistable atrisk / x = tatrisk class = stratum location = inside

                       colorgroup = stratum separator;

   keylegend 'censored' / location = inside position = topright;

   keylegend 'survival';

 

   yaxis min = 0;

   xaxis values = (0 to 210 by 30) label = "Days from Randomisation";

run;

 

If you are not familiar with proc template, how did you know to use the statgraph template?

By default proc template always has define statgraph <your template name>. You can use the proc SGPlot and the templateout option to create the GTL code which will create your SG plot. When you see the code you will see the defined statgraph.

 

In stepplot, x=time  y=survival, are these variables from the data?

Yes, TIME and SURVIVAL are variables from the SURVIVALPLOT dataset.

 

Is there an html style method available?

Yes, you could use HTMLBLUE style

 

Can you access the underlying GTL code for automatically created graphics (e.g. the plot created in proc lifetest)?

Yes, you can. Although this method is quite advanced. You can learn more about this which shows you all the relevant information in Chapter 2 of our book.

 

I sometimes have a difficult time figuring out which GTL element needs to be edited to produce the effect I desire? Is there any easy way to identify these GTL elements/variable names (similar to the popup style that works for table elements)?

The options are labeled to help you know which to edit. You can also go to sas.com and type in the proc you want to use and read the help guide that is presented. You need to understand the different areas of the graph that you want to change so you understand where to make the changes to your definition.

 

When do use layout overlay and layout gridded, apart from the example shown? Please throw some more light on it.

Think of using the LAYOUT OVERLAY for any graphs that you would have produced using SGPLOT because LAYOUT OVERLAY is for single-cell graphs. So if you would have created a scatter plot or a box plot in SGPLOT, then alternatively, you could use GTL and the LAYOUT OVERLAY to produce those graphs. If you want to add summary statistics to your graphs and want the rows and columns of the summary statistics lined up, you could use LAYOUT GRIDDED (nested within LAYOUT OVERLAY).

 

Hi, the variables or tokens you mentioned in mvar and nvar statements are those the name of macro variables we created earlier?

Yes, that is correct.

 

How easy is it to combine the Graph code with a SAS Viya report? Is it just Develop in SAS Studio then open in Viya? Is there any Reference material available?

Sorry, SAS Viya is out of the scope of this webinar. We are not sure.

 

Can you alter text size in MATRIX in SGSCATTER so that long variable names fit without being truncated?

I could not see a way. Perhaps make the variable name text shorter if possible.

 

Kriss, is it possible to provide ALL your code? I cannot visualize how all the sections go together.

Sure, the code can be found here.

 

Which proc can be used to create a Venn Diagram?

You can use PROC TEMPLATE and PROC SGRENDER to create a Venn Diagram. Please see this paper here, where I have created a macro that you can use to produce a Venn Diagram - https://www.sas.com/content/dam/SAS/support/en/sas-global-forum-proceedings/2018/1965-2018.pdf

 

Also, you can use the ELLIPSEPARM and TEXT statement within SGPLOT to create a Venn Diagram. Please see the example below.

data data_for_plot_layout;

   do x = 1 to 100;

      y = x;

      output;

   end;

run;

 

data data_for_plot_layout2;

  set data_for_plot_layout;

 

  if _n_ = 1 then do;

    A_x = 33;

    A_y = 50;

    A_text = "61";

 

    B_x = 66;

    B_y = 50;

    B_text = "66";

 

    AB_x = 50;

    AB_y = 50;

    AB_text = "63";

  end;

run;

 

 

proc sgplot data=data_for_plot_layout2 noautolegend;

  scatter x = x y = y / markerattrs=(size = 0);

  ellipseparm semimajor=22.5 semiminor=30/

     slope=0 xorigin=37 yorigin=50 fill clip

     lineattrs=(color=red) fillattrs=(color=red transparency = 0.75);

  ellipseparm semimajor=22.5 semiminor=30 /

     slope=0 xorigin=63 yorigin=50 fill clip

     lineattrs=(color=black) fillattrs = (color = green transparency = 0.75);

 

  /* Numbers */

  text  x=A_x y=A_y text = A_text / textattrs=(size = 10);

  text  x=B_x y=B_y text = B_text / textattrs=(size = 10);

  text  x=AB_x y=AB_y text = AB_text / textattrs=(size = 10);

 

  xaxis display = none;

  yaxis display = none;

run;

 

 

To do the time to event, we need just export the failure curve?

Yes, you just need to export and use the survival dataset (time-to-event dataset) to produce the survival curve.

 

How do you put the survival, say at the end of 6 months (180 days) as you did the median in the plot?

In DROPLINE statement, you would specify the X argument as 180 and you would specify the value that you want to use for the Y argument.

 

"With statement: scatter/jitter in box plot, when there are many points, like 5000 and 10,000 for each position, the graph will not show the different length of the points.

How can we show the difference of length?

When there are many points on the graph, such as 5000, the points are likely to overlap with each other. Seeing individual points would be very difficult in these circumstances.

 

WARNING: DISCRETELEGEND statement with DISPLAYCLIPPED=FALSE is getting clipped. The legend will not be drawn. How to get rid of this warning?

You can use DISPLAYCLIPPED = TRUE to force the legend even if it is clipped.  It just means the entire legend will not display which may not be ideal.  However, you can adjust the font size for the legend.

 

What should I do to get my basic statistics clear using SAS?

You could format your basic statistics to 1 or 2 decimal places, so that when you display them, they are easier to digest. Also only display the essential basic statistics. If you want to know how to do basic statistics in SAS, then you can search up how to use PROC UNIVARIATE or PROC MEANS in SAS.

 

Do you use ODS Graphics Designer to help create GTL code for graphs you are customizing? Are there any limitations to using ODS Graphics Designer to create GTL code?

We did not use ODS Graphics Designer in the webinar, and we do not use it to help create GTL code.

 

Kriss, I see you have used %do macro iteration loop inside the proc, which is cool. I am not quite clear how the values of those median residual times are assigned to the indexed macro variables? Are they coming from the MVAR, NVAR (or so) statements above? How that does work?

Recall six macro variables were previously created to represent the median survival values. They were MedianSurvival1, MedianSurvival2, MedianSurvival3, CMedianSurvival1, CMedianSurvival2, and CMedianSurvival3. MedianSurvival1 represents the median survival time of the Placebo group, and CMedianSurvival1 represents the median survival time of the Placebo group to 1 decimal place.

 

The following %DO statement was used below. And so instead of using "&i” within MedianSurvival&i, the values 3, 2 and 1 will be substituted, and therefore the median survival values for the three treatments will be used. Also &i, is used with GRAPHDATA to obtain different line style and colors for each of the three treatments.

%do i = 3 %to 1 %by -1;

   dropline y = 0.50 x = MedianSurvival&i  /

      dropto = both

      lineattrs=(thickness=1px

         color=graphdata&i:color 

         pattern=graphdata&i:linestyle)

      label=CMedianSurvival&i;

%end;

 

Any suggestions for fresher Biostatistician or statistical programmer where they should focus on?

You could start on SAS on SAS® Programming 1: Essentials and Statistics 1: Introduction to ANOVA, Regression, and Logistic Regression. They are free e-learning courses.

 

How can I put the text 'Number of Subjects at Risk' on top of its table?

You can use the TITLE option within the AXISTABLE table statement, that is:

 

axistable value=atrisk x=tatrisk / class=stratum colorgroup=stratum

                                                      title="Number of Subjects at Risk";                    

 

When you are trying to match a color to the color used in the specs, what is your strategy for trying to match the "spec color"? Just matching it thru "playing around with it"?

I am assuming you are trying to use the colors that are defined within your organization.  If your company has pre-defined colors, they will typically have the RGB or HEX codes associated with that color.  If there are no company pre-defined colors, but you want to make a color, then I suggest using paint to identify the RGB or HEX codes.  If you have the RGB codes, there a color-utility macros to help convert to HEX codes.  You can refer to this paper for details on coloring in your graph https://www.sas.com/content/dam/SAS/support/en/sas-global-forum-proceedings/2020/4660-2020.pdf.

 

After the workshop, could you provide a complete SAS code or a reference paper for creating the KM curve?

Sure, the code can be found here.

 

Is there anywhere in the resource lists that would have practice problems or examples that you can use to become more familiar with creating the graphs.

Graphically speaking is a good resource to learn about more about creating ODS Graphics.

https://blogs.sas.com/content/graphicallyspeaking/

 

Can we use markerattrs as any symbol? Here you used plus.

Yes, you could use other symbols with MARKERATTRS, such as a circle, diamond, square, star, triangle, etc.

 

What is the music that played while we waited for the webinar to start?😊

A fellow attendee said it’s Mozart's overture from Marriage of Figaro.

 

 

Recommended Resources

SAS® Graphics for Clinical Trials by Example e-book (Save 25% with code: GRAPHIC25)

Interactive Graphs

Animate Your Data!

SAS® 9.3 and 9.4 SG Procedures Tip Sheet

Gartner Magic Quadrant for Data Science and Machine Learning Platforms

 

Want more tips? Be sure to subscribe to the Ask the Expert board to receive follow up Q&A, slides and recordings from other SAS Ask the Expert webinars.  

Contributors
Version history
Last update:
‎09-17-2021 04:17 PM
Updated by:

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Article Labels
Article Tags