Greetings! This is my first time posting. My version of SAS is 9.4 TS Level 1M7 X_6410PRO platform.
I'd like to change the way my x axis appears. This syntax produces a line graph showing differences between the treatment and control groups.
My syntax is:
proc sgplot data= dataset;
where base_days_sober<90;
title "Negative Affect Among Participants with <90 Days of Recovery by Treatment Group";
      vline ttpbf /  response=na   group=tx_group  stat=mean  limits=both  limitstat=stderr  markers;
xaxis display=(nolabel) offsetmin=0.05 offsetmax=0.05;
yaxis label="Negative Affect" values=(10 to 50) ;
run;
The x axis of my graph is a categorical variable called "ttpbf" that really represents mean values of the outcome "negative affect" over time where 0=baseline, 1=after group (this takes place 2 weeks after baseline), 2=after the independent phase (this takes place 4 weeks after baseline), 3=no study activity (this is the gap between 4 and 8 weeks, but I'd like to see the label "no study activity" appear in the place where 6 weeks after baseline would go), and 4=one month follow up (8 weeks after baseline).
Right now with the syntax I'm using, the x axis value for #3 above does not print because there is no data for that category. But I'd like to to print that label to show the gap in study activity between weeks 4 and 8 and also so that the x axis is evenly spaced for every 2 week interval. The way it appears now a 4 week gap is equal to the distance of all the other 2 week intervals. This is how it appears now (see picture below).
I've learned a lot searching within the communities website but am still stuck with this and a few other questions.
I'd also like to:
1. fix the legend so that the "Maroon/tx=1, Gold/co=0" is removed
2. make the part of the line from exit interview to 1 month follow up dashed instead of solid
3. change the colors into grayscale patterns to make the graphs more accessible to colorblind
4. change the labels in the legend where it says Control/gold and Treatment/maroon to other labels.
Thank you!
I repeat: DATA.
Your "real" data has not been shown. So it may be that the format definition and values list don't work because the "real" data has different values than the practice data. From the picture you show I strongly suspect that you actually have values of the X axis variable of 0, 1, 2 and 3 with no values of 4. So the format you wrote, myxaxis, does not align with the actual values in your data.
Quick check. Show the result of this code.
Proc freq data=dataset; tables ttpbf; format ttpbf ; run;
Note that the format statement here is to clear any possible other format that was assigned to the variable ttpbf and would default to a Best for numeric values. That should show the underlying values in the data set. IF there are values of 3 shown then that will explain the difference in the graphs.
You force the appearance of values on an axis by using a VALUES option on the XAXIS. From the description of your data I cannot tell what the actual values you have, or may have in the data, actually are. If the values are of Ttbf are actually 1, 2, 3 then add a values statement with those values to your xaxis statement.
When you state something like this: "But I'd like to to print that label to show the gap in study activity between weeks 4 and 8 and also so that the x axis is evenly spaced for every 2 week interval." You really can't use discrete values but must use something that has a numeric interval that corresponds to the intervals that you want on the axis. So time to show some actual data.
1. fix the legend so that the "Maroon/tx=1, Gold/co=0" is removed.
Without actual data I have to assume that that the offending text is the label of the group variable. Add a KEYLEGEND statement and use Title=" " to suppress the name or label of the group variable from appearing.
2. make the part of the line from exit interview to 1 month follow up dashed instead of solid.
Group value has one appearance in a graph. If you want a line to change type, from solid to dashed then you need to overlay a second graph using the x and y values with a different group variable. Overlay means a second Vline statement using those values. Additional plots means that your Legend management gets a bit more complicated. It would now have 4 group values to display in the legend. With different lines you likely need to do something to describe why.
3. change the colors into grayscale patterns to make the graphs more accessible to colorblind.
Simplest would be use use an ODS grayscale style. The ones supplied by SAS change by version. You may have a style named Grayscalprinter that would use grayscale by default. There are likely several Journal styles as well.
Another option if changing style messes with other output is to use a DATTRMAP data set which allows you to set display properties for items controlled by group variables. You would need entries for each group variable by FORMATTED value and allows you to set marker type color and size, text font color and size, line pattern color and thickness. Options also allow you to control whether values of a group variable are displayed or not when not actually in a plot.
4. change the labels in the legend where it says Control/gold and Treatment/maroon to other labels.
Legend values display the default formatted value of the group variable. If you need to change the text that appears and you are setting the values with a format then create a second format to display the text you want. If the values shown are the actual values then create a format to display the desired text and use the created format in a Format Statement with the variable and the custom format name assigned.
Thank you for your reply.
In terms of #2, I got the solution to work, that is, this syntax worked to remove the text I didn't want in the legend:
keylegend/title=" ";I added this values statement for the x axis to print the labels that I wanted, and while it worked (the labels look good now), the graphed lines no longer appear:
xaxis values=("Baseline" "After Group Phase" "After Independent Phase" "No Study Activity" "1 Month Follow Up") display=(nolabel) offsetmin=0.05 offsetmax=0.05;I tried adding a values statement with numbers in it but that didn't work. Here is what the output looks like now:
I realize you are suggesting I add another values statement with numbers in it, but after a search, I couldn't find an example of what that looked like, so I couldn't figure it out.
I'm sorry but I'm new to SAS and I didn't know what you meant by, "From the description of your data I cannot tell what the actual values you have, or may have in the data, actually are. If the values are of Ttbf are actually 1, 2, 3 then add a values statement with those values to your xaxis statement." What do you mean by actual values? What would the code look like? After a search and trying somethings I found, I couldn't get it to work.
#4: I believe these are the actual values, so what I'd have to do is what you say here: " If the values shown are the actual values then create a format to display the desired text and use the created format in a Format Statement with the variable and the custom format name assigned." What would that code look like? After a search I couldn't find an example.
In case it is helpful, here is what the revised syntax looks like now:
proc sgplot data= dataset;
where base_days_sober<90;
title "Negative Affect Among Participants with <90 Days of Recovery by Treatment Group";
      vline ttpbf /  response=na   group=tx_group  stat=mean  limits=both  limitstat=stderr  markers;
xaxis values=("Baseline" "After Group Phase" "After Independent Phase" "No Study Activity" "1 Month Follow Up") display=(nolabel) offsetmin=0.05 offsetmax=0.05;
yaxis label="Negative Affect" values=(10 to 50) ;
keylegend/title=" ";
run;Thank you!
Data.
We can't test code without data.
If you list values on an XAXIS statement that do not exist in the data then there is nothing to show. So the values you used are not the values in the data. Which again is the reason I said I cannot tell from your description what actual values are in your data. So it appears that maybe your data already has a custom format assigned to the X variable.
I don't use VLINE much so misspoke about the overlay. You need to change the values of the Groupvar and split the data at a boundary point. This is because VLINE basically only allows a single group variable. SERIES plots allow multiple group variables for different plots, which is what I was thinking of.
This a crude example.
data have;
   input x y groupvar $;
datalines;
1 1 A
2 3 A
3 3 A
4 6 A
5 5 A
;
/* example graphing that*/
proc sgplot data=have;
   vline x /response=y group=groupvar;
run;
/* data needed to change appearance of line segments starting at x=3*/
data need;
   set have;
   if x = 3 then do;
      output;
      groupvar='B';
   end;
   if x gt 3 then do;
      groupvar='B';
   end;
   output;
run;
proc sgplot data=need
;
   vline x /response=y group=groupvar;
run;
Note that an additional record has to be created at the X value where the line appearance changes so that both line segments with the changed group variable value have end points.
Proc format to change displayed values is something like, using the example data above
Proc format; value myx 1='Start' 2='2nd Value' 3='Change point' 4='After Change' 5='Later' ; run; proc sgplot data=need ; vline x /response=y group=groupvar; format x myx.; run;
There are literally 100's of not 1000's of Format examples on this forum alone, so I am not sure what you searched for that couldn't find any. If the values of the variable are character the format name starts with a $ and the values on the left side of the = in the Proc Format value list should be in quotes.
Thank you for your patience. I'm learning a lot and things are starting to make sense.
So, here is an example data set with example graphing syntax:
data yoga3;
   input ttpbf na tx_group $;
datalines;
0 17 T
0 17.5 T
0 16.5 C
0 17 C
1 11 T
1 12 T
1 30 C
1 31 C
2 11 T
2 13 T
2 35 C
2 36 C
4 13 T
4 14 T
4 37 C
4 32 C
;
proc sgplot data= yoga3;
title "Negative Affect Among Participants with <90 Days of Recovery by Treatment Group";
      vline ttpbf /  response=na   group=tx_group  stat=mean  limits=both  limitstat=stderr  markers;
xaxis display=(nolabel) offsetmin=0.05 offsetmax=0.05 ;
yaxis label="Negative Affect" values=(10 to 50) ;
keylegend/title=" ";
run;
Then I run the proc format command based on your example, which puts a file in my "Formats" Folder (this is the first time I'm ever using the formats folder):
proc format;
value myxaxis
0='Baseline'
1='After Group Phase'
2='After Independent Practice Phase'
3='No Data Collected for 1 Month'
4='1 Month Follow Up'
;
run;It appears correctly in my formats folder.
Then I can't seem to figure out how to put the command to use myxaxis into the graphing syntax. Here is one of my attempts:
proc sgplot data= yoga3;
title "Negative Affect Among Participants with <90 Days of Recovery by Treatment Group";
      vline ttpbf /  response=na   group=tx_group  stat=mean  limits=both  limitstat=stderr  markers;
xaxis display=(nolabel) offsetmin=0.05 offsetmax=0.05 ;
format ttpbf myxaxis;
yaxis label="Negative Affect" values=(10 to 50) ;
keylegend/title=" ";
run;That is my major question now, and I'm also unclear about:
1. how to get T to appear as a black line, how to get C to appear as a dashed line (which will satisfy my need for colorblind readers, rather than my grayscale request from yesterday)
2. how to change the name of the legend from T and C to Treatment and Control
Thanks again for your patience, the light is slowly dawning and I'm also learning how to post to these message boards.
You were almost there with the custom format.
If you read your log, using your data and code, you should see something like this:
32 proc sgplot data= yoga3; 33 title "Negative Affect Among Participants with <90 Days of Recovery by Treatment Group"; 34 vline ttpbf / response=na group=tx_group stat=mean limits=both limitstat=stderr 34 ! markers; 35 xaxis display=(nolabel) offsetmin=0.05 offsetmax=0.05 ; 36 format ttpbf myxaxis; WARNING: Variable MYXAXIS not found in data set WORK.YOGA3. 37 yaxis label="Negative Affect" values=(10 to 50) ; 38 keylegend/title=" "; 39 40 run;
That warning appears because you did not use the dot with the format name. If you use the dot at the end of the format name like this the values should appear in the axis as desired.
format ttpbf myxaxis. ;
To display a text value other than T or C for your TX_group variable, again use a format.
You can provide a list of values for line patterns and colors using the styleattrs statement if not too complicated.
proc format;
value myxaxis
0='Baseline'
1='After Group Phase'
2='After Independent Practice Phase'
3='No Data Collected for 1 Month'
4='1 Month Follow Up'
;
value $tx_group
'T' = 'Treatment'
'C' = 'Control'
;
run;
proc sgplot data= yoga3 ;
styleattrs datalinepatterns =(solid dash)
   datacontrastcolors = (black gray)
   
;
title "Negative Affect Among Participants with <90 Days of Recovery by Treatment Group";
      vline ttpbf /  response=na   group=tx_group  stat=mean  limits=both  limitstat=stderr  markers;
xaxis display=(nolabel) offsetmin=0.05 offsetmax=0.05 ;
format ttpbf myxaxis. tx_group $tx_group.;
yaxis label="Negative Affect" values=(10 to 50) ;
keylegend/title=" ";
run;
The Styleattrs statement will override the defaults set by your current ODS Style. The Linepatterns has a number of named patterns involving words like Dash or Dot in combinations or alone or use a number from 1 (solid line) to 46. Best is to look up the line patterns in your online help to find the ones you want. Colors are set in different options depending on what is to be colored so you may need to play with DATACOLORS or DATACONTRASTCOLORS to see which works for specific graphs (or spend a lot of time reading documentation).
When the Styleattrs are used the "lowest" value in your data encountered will generally get the first property linetype or color needed.So since C comes before T it gets the first line type, Solid, and first color, Black. So changing the order of the option value lists may be needed to match desired appearance.
I included a second color just to demonstrate here. If you want all of the lines to be the same color then the LINEATTRS option of the Vline statement such as: Lineattrs=(color=black) You can only set one color per Vline (or other line generating statement).
You may find this site helpful for code examples of some types of graphs: https://support.sas.com/en/knowledge-base/graph-samples-gallery.html The different topic areas show pictures of graphs and link to code with data sets that you can run and modify as desired. There are links by graphic procedure.
Thank you for your patience. Indeed we are almost there!
I ran into a problem and I was able to resolve it myself, if you can believe that.
One last wrinkle. Here is what the practice graph looks like now:
Just so it's handy to look at, the practice data set looks like this:
data yoga3;
   input ttpbf na tx_group $;
datalines;
0 17 T
0 17.5 T
0 16.5 C
0 17 C
1 11 T
1 12 T
1 30 C
1 31 C
2 11 T
2 13 T
2 35 C
2 36 C
4 13 T
4 14 T
4 37 C
4 32 C
;Notice how there are no values for the variable ttpbf=3. That's the same in the real data set. There are no values for ttpbf=3, but I would still like the x axis to look like this:
Where "no study activity" would be where ttpbf=3 would go, if there were any data there. Let me draw it another way: this is how I'd like to see it:
See how the line goes from "after independent phase" straight to "1 month follow up"? I'd like to create that if I can because it shows the gap in study activity and also makes a tick mark on the x axis every two weeks so the spacing between "after independent phase" and "1 month follow up" is four weeks.
That's my last query!
Now is the time to add the VALUES = on the Xaxis statement since we know that the values are numeric and the actual values.
Since VLINE is basically a categorical plot the spaces between the X values is controlled by the number of elements. If you want to have more control on spacing I think you would need to change the plot type.
proc sgplot data= yoga3 ;
styleattrs datalinepatterns =(1 dash)
   datacontrastcolors = (black gray)
   
;
title "Negative Affect Among Participants with <90 Days of Recovery by Treatment Group";
      vline ttpbf /  response=na   group=tx_group  stat=mean  limits=both  limitstat=stderr  markers;
xaxis display=(nolabel) offsetmin=0.05 offsetmax=0.05 values=(0 to 4 by 1) ;
format ttpbf myxaxis. tx_group $tx_group.;
yaxis label="Negative Affect" values=(10 to 50) ;
keylegend/title=" ";
run;
@BalletYoga wrote:
Thank you for your patience. Indeed we are almost there!
I ran into a problem and I was able to resolve it myself, if you can believe that.
One last wrinkle. Here is what the practice graph looks like now:
Just so it's handy to look at, the practice data set looks like this:
data yoga3; input ttpbf na tx_group $; datalines; 0 17 T 0 17.5 T 0 16.5 C 0 17 C 1 11 T 1 12 T 1 30 C 1 31 C 2 11 T 2 13 T 2 35 C 2 36 C 4 13 T 4 14 T 4 37 C 4 32 C ;Notice how there are no values for the variable ttpbf=3. That's the same in the real data set. There are no values for ttpbf=3, but I would still like the x axis to look like this:
Where "no study activity" would be where ttpbf=3 would go, if there were any data there. Let me draw it another way: this is how I'd like to see it:
See how the line goes from "after independent phase" straight to "1 month follow up"? I'd like to create that if I can because it shows the gap in study activity and also makes a tick mark on the x axis every two weeks so the spacing between "after independent phase" and "1 month follow up" is four weeks.
That's my last query!
Another option that might be better is no line between "after independent phase" and "1 month follow up" but instead have much bigger bolder markers throughout so that the "1 month follow up" marker stands along with no line connecting it.
This works for the example data:
but not for my real data, the graph of which ends up looking like this:
Using nearly the identical syntax; here it is for the real data:
proc sgplot data= dataset;
styleattrs datalinepatterns =(solid dash)
   datacontrastcolors = (black gray)
   
;
where base_days_sober<90;
title "Negative Affect Among Participants with <90 Days of Recovery by Treatment Group";
      vline ttpbf /  response=na   group=tx_group  stat=mean  limits=both  limitstat=stderr  markers;
xaxis display=(nolabel) offsetmin=0.05 offsetmax=0.05 values=(0 to 4 by 1);
format ttpbf myxaxis. tx_group the_new. ;
yaxis label="Negative Affect" values=(10 to 50) ;
keylegend/title=" ";
run;and here it is for the practice data:
proc sgplot data= yoga3;
styleattrs datalinepatterns =(solid dash)
   datacontrastcolors = (black gray)
   
;
title "Negative Affect Among Participants with <90 Days of Recovery by Treatment Group";
      vline ttpbf /  response=na   group=tx_group  stat=mean  limits=both  limitstat=stderr  markers;
xaxis display=(nolabel) offsetmin=0.05 offsetmax=0.05 values=(0 to 4 by 1);
format ttpbf myxaxis. tx_group $tx_group.;
yaxis label="Negative Affect" values=(10 to 50) ;
keylegend/title=" ";
run;I repeat: DATA.
Your "real" data has not been shown. So it may be that the format definition and values list don't work because the "real" data has different values than the practice data. From the picture you show I strongly suspect that you actually have values of the X axis variable of 0, 1, 2 and 3 with no values of 4. So the format you wrote, myxaxis, does not align with the actual values in your data.
Quick check. Show the result of this code.
Proc freq data=dataset; tables ttpbf; format ttpbf ; run;
Note that the format statement here is to clear any possible other format that was assigned to the variable ttpbf and would default to a Best for numeric values. That should show the underlying values in the data set. IF there are values of 3 shown then that will explain the difference in the graphs.
This got to the heart of it!
It wasn't the ttpbf variable I wanted, it was a new one I created, called ttpbf_graph, so
Proc freq data=dataset;
   tables ttpbf_graph;
   format ttpbf_graph ;
run;When I used that variable it showed results for 0, 1, 2, and 4, and then when I ran the code with that variable replaced: voila! It worked! Here is the final code that worked for the "real" data:
proc sgplot data= dataset;
styleattrs datalinepatterns =(solid dash)
   datacontrastcolors = (black gray)
   
;
where base_days_sober<90;
title "Negative Affect Among Participants with <90 Days of Recovery by Treatment Group";
      vline ttpbf_graph /  response=na   group=tx_group  stat=mean  limits=both  limitstat=stderr  markers;
xaxis display=(nolabel) offsetmin=0.05 offsetmax=0.05 values=(0 to 4 by 1);
format ttpbf myxaxis. tx_group the_new. ;
yaxis label="Negative Affect" values=(10 to 50) ;
keylegend/title=" ";
run;It seemed that I had to re-run the proc format syntax from yesterday again in order for the new syntax to work, but after I did so it ran beautifully.
This is the proc format syntax that I seemed to need to run again today:
proc format;
value the_new
0 = 'Control'
1 = 'Journaling'
;
run;
proc format;
value myxaxis
0='Baseline'
1='After Group Phase'
2='After Independent Practice Phase'
3='No Data Collected for 1 Month'
4='1 Month Follow Up'
;
run;Thank you profoundly for your patience and expertise!!
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.
