BookmarkSubscribeRSS Feed
PrinceAde
Obsidian | Level 7

Hi everyone.

I'm trying to construct a boxplot, but I have not been able to take any steps. I don't understand what is going on in the table. I will temporarily attach a picture of what I'm trying to describe.  

Description of what I see;

The x-axis seems to consist of two variables, Visit which are baseline, day-64 day-120 day-360 and Strains which are Wuhan XD XE XF.

The y-axis has two variables on either side of the box...the left side is labelled as median(iqr) and the right side is age-grp, 8-39 at the top and 40-60 at the bottom. Please, I need an understanding of what is going on and how I can approach it.

12 REPLIES 12
Quentin
Super User

Sorry, your question is not clear. As you say, you've shown a picture of a paneled box plot.  Actually, it looks like maybe a picture of a screen-shot of a paneled box plot.  Do you have the data and code that generate this box plot?  Or is this a case where someone sent you a picture and said "please make a plot like this"  ?  We could guess at some of the meaning of this chart, but it would probably be better for you to go back to the source for this photo and ask them to explain it to you.

 

That said, it looks like the kind of plot you could make with SGPANEL.  The panel axes are Strain and Age.  Each box plot is of CD8+ Tcells, with category=visit and group=serostatus at baseline.  If you look at the bottom left plot, it shows how CD8+ Tcells varies by visit and baseline serostatus, for patients with the Wuhan strain that are 40-60 years old.

 

BASUG is hosting free webinars Next up: Don Henderson presenting on using hash functions (not hash tables!) to segment data on June 12. Register now at the Boston Area SAS Users Group event page: https://www.basug.org/events.
Rick_SAS
SAS Super FREQ

It looks like this panel has the following characteristics:

1. There are two panel variables, for a total of 8 cells. The vertical panel variable has 2 classes, which I will call ROW1 and ROW2. The horizontal panel variable has 4 classes, which I will call COL1-COL4.

2. Within each cell, there is a categorical variable with levels CAT1-CAT4 and a group variable with levels GROUP1-GROUP4. There is also a Y variable (the response).

 

Study the output from the following example. Then maps my variable names to the variables in your data set:

data Have;
call streaminit(1);
do ROW = 'Row1', 'Row2';
   do COL = 'Col1', 'Col2', 'Col3', 'Col4';
      do CAT = 'Cat1', 'Cat2', 'Cat3', 'Cat4';
         do GROUP = 'Group1', 'Group2', 'Group3', 'Group4';
            do Rep = 1 to 10;
               Y = rand("Normal");
               output;
            end;
         end;
      end;
   end;
end;
run;

proc sgpanel data=Have;
   panelby COL ROW / onepanel layout=lattice columns=4 novarname;
   vbox Y / category=CAT group=GROUP;
run;

 SGPanel7.png

PrinceAde
Obsidian | Level 7

Hi @Rick_SAS  @Quentin  thanks for the reply.

 

In addition, I realized that I need to also overlay the box plot with a scatter plot of Serostatus(which comprises of seropositive and seronegative), I added it but it did not come out in the graph. 

 

Also  the rowaxis is 0.1, 0.3 and 1.0, despite the apparent gap between 0.1, 0.3 and 1.0, in the plot the space in the rows remains the same in the earlier picture which does not seem logical, I tried to use list of string values, value=("0.1" "0.3" "1.0"), it simply ordered it based on the data(I'm assuming it did this by default). 

I'm also unable to increase the width/size of the graph, which I think is the reason the column(x) axis are slanted.

 

How can I control the color of each box?

lastly the panelby and boxed, how can I remove the box.

 

Below is the code I currently have.

 

I ask too many questions, please bear with me.

 

PROC SORT DATA=simulated OUT=simu1;
BY Visit Day ID Strain Agegrp serostatus;
RUN;

/*Changing the structure of the dataset from wide to long such that the cytokine's are represented as a single variable as
well their corresponding values are in another column*/
PROC TRANSPOSE DATA=simu1 OUT=simu2(rename=(col1=med_iqr _name_=cytokine));
BY Visit Day ID Strain Agegrp serostatus;
VAR IFNg TNFa IL_2 CD107a;
RUN;

 

title;
footnote;
proc sgpanel data=simu2;
/* Use the SORT= option to order the panels */
panelby strain agegrp / rows=2 columns=4 layout=lattice COLHEADERPOS=bottom sort=ascending uniscale=column novarname START=BOTTOMLEFT NOBORDER;

scatter x=serostatus y=med_iqr / jitter filledoutlinedmarkers
markerattrs=(symbol=circlenotfilled size=11)
markerfillattrs=(color=cxfdae6b)
markeroutlineattrs=(color=black)
transparency=0.5;
vbox med_iqr / category=visit group=cytokine fill BOXWIDTH=0.8
lineattrs=(color=black thickness=2)
whiskerattrs=(color=black thickness16982450916596892069818535075181.jpg16982451666033166555062026863651.jpg=2)
medianattrs=(color=black thickness=2);
rowaxis values=(0.1,0.3,1.0)grid display=(nolabel);
colaxis values=("Baseline" "Day 64" "Day 120" "Day 360")display=(nolabel);
run;

ballardw
Super User

You need to look at the "plot type compatibility" in the documentation of Sgplot and Sgpanel.

VBOX is a distribution type plot. As such, neither procedure will combine it with a Basic type plot such as Scatter.

 

Not to mention that you VBOX uses the variable VISIT as the "horizontal" axis. So scatter with X=serrorstatus would have nothing in common with the Vbox axis to plot with.

Rick_SAS
SAS Super FREQ

Ballardw's first point is incorrect: You can overlay a scatter plot and some box plots. However, perhaps he meant that use of VBOX with both CATEGORY= and GROUP= options means that the scatterplot will not perfectly align with the box plots.

 

it is his second point that is more relevant. In PROC SGPLOT, you can overlay a second Y variable if you use the Y2AXIS option in the SCATTER statement. But PROC SGPANEL does not support this option.

 

Personally, I think this graph is already overcrowded. You aren't going to end up with a readable graph if you overlay a scatter plot.  I would put the scatter plot in a second panel. Or, forget about using SGPANEL, switch to SGPLOT and convert the PANELBY statement to a BY statement to get 8 separate graphs.

PrinceAde
Obsidian | Level 7
 
I was able to overlay the two plots, but because I specified two different group values in the scatter statement(group=serostatus) vbox statement (group=cytokine), the legend of 16983106184202248955427558411482.jpgthe@ cytokine was overridden... I want both legend for cytokine and serostatus to appear at the bottom of the graph.
 
The serostatus variable contains seropositve and seronegative, I want them to have two distinct symbools, square and circle respectively. I specified the markerattrs= option twice but only the second displays which is square, how can I achive this?
 
Thanks.
 
proc sgpanel data=simu2;
  /* Use the SORT= option to order the panels */
  panelby strain agegrp  /  rows=2 columns=4 layout=lattice  COLHEADERPOS=bottom sort=ascending uniscale=column novarname START=BOTTOMLEFT NOBORDER;
  styleattrs datacolors=(rose green aqua purple);
  scatter x=visit y=med_iqr / name="scatter"   group=serostatus name="a" jitter markerattrs=(size=6px symbol=circle)
   markerattrs=(color=CX4F4F4F symbol=square size=5px)  name="a" jitter;
  
 
     /**markerattrs=(symbol=circle size=6px) jitter
      markerfillattrs=(color=black)
      markeroutlineattrs=(color=black)
      transparency=0.5;*/
  vbox med_iqr / category=visit name= "b" group=cytokine fill  BOXWIDTH=0.8 
 
 
      lineattrs=(color=black thickness=1)
      whiskerattrs=(color=black thickness=1)
      medianattrs=(color=black thickness=1);
 
  rowaxis values=(0.00001 0.5 1 1.5) valuesdisplay=("0.1" "0.3" "1.0") label="CD8+ T cells(%)*Median(IQR)";
  colaxis values=("Baseline" "Day 64" "Day 120" "Day 360") grid display=(nolabel);
 
   
run;
Quentin
Super User

Edit: Updated the code to have SCATTER after VBOX so that scatterplot will show on top of box plots per @Rick_SAS's comment .  I didn't update the photo.  And whenever I edit code posted here, I end up with problems with line breaks, so if you see VBOX and SCATTER on one line below, that's not what I intended, but the code will work, because semicolons...

 

One way to specify the symbols for serostatus is to use an attribute map.  You can use the keylegend statement to define what should appear in the legend.  

 

It would be helpful if you would post code to create a small amount of example data with your variable names, along with your code to make a graph.

 

I took Rick's sample data and added a serostatus variable,  then created a plot with the overlaid symbols.

 

Code like:

 

data Have;
    call streaminit(1);
    do ROW = 'Row1', 'Row2';
       do COL = 'Col1', 'Col2', 'Col3', 'Col4';
          do CAT = 'Cat1', 'Cat2', 'Cat3', 'Cat4';
             do GROUP = 'Group1', 'Group2', 'Group3', 'Group4';
                do Serostatus="Neg","Pos";
                    do Rep = 1 to 10;
                        Y = rand("Normal");
                        output;
                    end;
                end;
             end;
          end;
       end;
    end;
run;
    
data myattr ;
    ID="SeroStatus"  ;
  
    Value='Neg' ;
    MarkerSymbol='Circle' ;
    output ;
  
    Value='Pos' ;
    MarkerSymbol='Square' ;
    output ;
run ;
  
proc sgpanel data=Have dattrmap=myattr noautolegend;
    panelby COL ROW / onepanel layout=lattice columns=4 novarname;
    vbox Y / category=CAT group=GROUP name="vbox";
    scatter x=cat y=y /group=serostatus ATTRID=SeroStatus jitter name="scatter";
    keylegend "scatter"/ title="Serostatus";
    keylegend "vbox"/ title="Strain";
run;

I agree with Rick that the overlaid scatterplot makes it a bit messy.

 

panelplot.PNG

 

 

BASUG is hosting free webinars Next up: Don Henderson presenting on using hash functions (not hash tables!) to segment data on June 12. Register now at the Boston Area SAS Users Group event page: https://www.basug.org/events.
Rick_SAS
SAS Super FREQ

@Quentin You need to put the SCATTER statement after the VBOX statement if you want the markers to be visible (on top of the box) .

PrinceAde
Obsidian | Level 7

@Rick_SAS  It simply replaces the scatter plot. I actually want both to appear.

Thanks.

FreelanceReinh
Jade | Level 19

@Quentin wrote:

... whenever I edit code posted here, I end up with problems with line breaks, ...


Hi @Quentin, the solution Re: Bug when editing existing post helps to avoid these problems.

Quentin
Super User


@FreelanceReinh wrote:

@Quentin wrote:

... whenever I edit code posted here, I end up with problems with line breaks, ...


Hi @Quentin, the solution Re: Bug when editing existing post helps to avoid these problems

Ooh thanks @FreelanceReinh .  I missed that solution.  And I've been living with this annoyance for years. : )  Often I will just replace the whole block of code.  Good to know of this workaround!

BASUG is hosting free webinars Next up: Don Henderson presenting on using hash functions (not hash tables!) to segment data on June 12. Register now at the Boston Area SAS Users Group event page: https://www.basug.org/events.
Quentin
Super User

re: 


@PrinceAde wrote:

Also  the rowaxis is 0.1, 0.3 and 1.0, despite the apparent gap between 0.1, 0.3 and 1.0, in the plot the space in the rows remains the same in the earlier picture which does not seem logical 

It looks like your original plot has a logarithmic y-axis.  

BASUG is hosting free webinars Next up: Don Henderson presenting on using hash functions (not hash tables!) to segment data on June 12. Register now at the Boston Area SAS Users Group event page: https://www.basug.org/events.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 12 replies
  • 1718 views
  • 7 likes
  • 5 in conversation