Data visualization with SAS programming

Overlay histograms - all data vs subset

Accepted Solution Solved
Reply
Contributor
Posts: 50
Accepted Solution

Overlay histograms - all data vs subset

Hi everyone,

Does anyone know of a straightforward way to overlay histograms where the two histograms are not grouped but one represents the histogram of the entire dataset and the second histogram represents a subset of the dataset? 

My data contains columns that include Location, ParameterType and ParameterValue.  So I am trying to compare the distribution of the values overall vs a particular location.  I can do this as separate graphs with the following code:

proc univariate data=myData (where=(location = &currentloc and ParameterType = &ParamSelect)) noprint;

     histogram ParameterValue / barlabel = count endpoints=&startpt to &endpt;

run;

proc univariate data=myData (where=(ParameterType = &ParamSelect)) noprint;

     histogram ParameterValue / barlabel = count endpoints=&startpt to &endpt;

run;

This gives me the correct plots but plotted separately.  I can use greplay to get the two plots grouped together, but I would like to overlay the two on the same plot. I have been searching and there are plenty of examples using overlay when you use different variables but not from different groupings of the same variable. Any suggestions?

Thanks!!

Fred


Accepted Solutions
Solution
‎10-21-2014 10:49 AM
SAS Super FREQ
Posts: 1,044

Re: Overlay histograms - all data vs subset

Yes, you can - using GTL.  In GTL you can use the EVAL statement to extract a subset on the fly from the same column.  You could use something like the following.  But, technically this is the same as creating your own 2nd column as that is what GTL is doing for you behind the scene.

proc template;

  define statgraph histogram;

    begingraph;

      layout overlay / cycleattrs=true;

       histogram systolic / name='a' datatransparency=0.5 legendlabel='All' binstart=80 binwidth=10 binaxis=false;

       histogram eval(ifn(weight_status eq 'Overweight', systolic, .)) / name='b' binaxis=false

                  datatransparency=0.5 legendlabel='Overweight' binstart=80 binwidth=10;

        discretelegend 'a' 'b' / location=inside autoalign=(topright) across=1;

       endlayout;

    endgraph;

  end;

run;

ods graphics / reset width=5in height=3in imagename='histogramOverlay';

proc sgrender data=sashelp.heart template=histogram;

run;

histogramOverlay.png

View solution in original post


All Replies
SAS Super FREQ
Posts: 1,044

Re: Overlay histograms - all data vs subset

If you have two separate columns in a data set, you can do the following using SGPLOT procedure.

OverlayHistogramHeart.png

http://blogs.sas.com/content/graphicallyspeaking/2013/11/21/comparative-histograms/

proc sgplot data=sashelp.heart;

  title 'Distribution of Blood Pressure';

  histogram diastolic / binstart=50 binwidth=5 transparency=0.5;

  histogram systolic / binstart=50 binwidth=5  transparency=0.5;

  xaxis display=(nolabel);

  yaxis grid;

  keylegend / location=inside position=topright across=1;

  run;

Contributor
Posts: 50

Re: Overlay histograms - all data vs subset

Hi Sanjay,

I am aware of this approach, but if you look carefully at my code above, the data that I am creating histograms from are from one column. The difference is that one histogram represents all the data in the column, and the second histogram represents a subset based on location. I could create another datastep that will create two columns and that may be the easiest.  I was just hoping there would be a better approach that would allow me to overlay based on all vs subset of the same column.

Thanks,

Fred

Solution
‎10-21-2014 10:49 AM
SAS Super FREQ
Posts: 1,044

Re: Overlay histograms - all data vs subset

Yes, you can - using GTL.  In GTL you can use the EVAL statement to extract a subset on the fly from the same column.  You could use something like the following.  But, technically this is the same as creating your own 2nd column as that is what GTL is doing for you behind the scene.

proc template;

  define statgraph histogram;

    begingraph;

      layout overlay / cycleattrs=true;

       histogram systolic / name='a' datatransparency=0.5 legendlabel='All' binstart=80 binwidth=10 binaxis=false;

       histogram eval(ifn(weight_status eq 'Overweight', systolic, .)) / name='b' binaxis=false

                  datatransparency=0.5 legendlabel='Overweight' binstart=80 binwidth=10;

        discretelegend 'a' 'b' / location=inside autoalign=(topright) across=1;

       endlayout;

    endgraph;

  end;

run;

ods graphics / reset width=5in height=3in imagename='histogramOverlay';

proc sgrender data=sashelp.heart template=histogram;

run;

histogramOverlay.png

Contributor
Posts: 50

Re: Overlay histograms - all data vs subset

Thanks Sanjay,

That's exactly what I was looking for. While it is essentially the same thing as the first approach, the GTL approach lets me set up a template for all the plots and then loop through locations. I think it is a more elegant approach.

I appreciate the help!

New Contributor ch
New Contributor
Posts: 2

Re: Overlay histograms - all data vs subset

Hi Fred and Sanjay,

This helps tremendously with a project that I am working on.  I have an 'all' data sets which includes a column/variable with the ages of a patient population. Another column has the provider id (this can be repeated because the table is a patient level table).  In this example, you've allowed me to compare the age distribution of the overall population with the age distribution of a single provider.

However, I have thousands of distinct providers!  I'm wondering how you 'looped' through all your locations.  I tried putting the distinct list of providers into a macro variable but I would still have to call each one.  How did Fred, how did you 'loop' through your locations?  I would like to know how to modify the following statment to loop through all the distinct values in a particular column:

histogram eval(ifn(weight_status eq 'Overweight', systolic,.))

I recognize this will make thousands of histograms, but that is what is desired.

Thanks so much for any help you can provide!

Super Contributor
Posts: 252

Re: Overlay histograms - all data vs subset

Hello,

I've modified Sanjay's original code to show how you can cycle through all your locations using Macro Variables.

proc template;
  define statgraph histogram;
    begingraph;
mvar parameter; /* Creating Macro Variable, so that distinct paramters can be called */
entrytitle parameter;
      layout overlay / cycleattrs=true;

       histogram systolic / name='a' datatransparency=0.5 legendlabel='All' binstart=80 binwidth=10 binaxis=false;
       histogram eval(ifn(weight_status eq parameter, systolic, .)) / name='b' binaxis=false
                  datatransparency=0.5 legendlabel= parameter binstart=80 binwidth=10;

        discretelegend 'a' 'b' / location=inside autoalign=(topright) across=1;
       endlayout;
    endgraph;
  end;
run;


proc sort data = sashelp.heart out = heart;
by weight_status;
run;

/* Obtaining distinct parameters */

proc sql;
  select distinct cats("%","param(",weight_status, "," ,weight_status, ");"), weight_status into: parameter separated by " " ,: nicename
  from heart
  where weight_status ne "";
quit;


%macro param(parameter, nicename);

ods graphics / reset width=5in height=3in imagename="histogramOverlay_&nicename";
  proc sgrender data=heart template=histogram;
run;

%mend;

/* Calling the distinct parameters */

&parameter;

New Contributor ch
New Contributor
Posts: 2

Re: Overlay histograms - all data vs subset

Thanks for the reply!

I'm having a hard time getting the code to run.  I ended up switching the proc sql to look like:

proc sql;

     select distinct weight_status into : parameter separated by " "

     from heart

     where weight_status ne "";

quit;

Could you further explain the line you suggested?

select distinct cats("%","param(",weight_status, "," weight_status, ");"), weight_status into: parameter separated by " ",:nicename

I received a character limit exceeded error on this.  Not surprising given the number of distinct providers (each with 9-10 digit ids) was 3,570 but the number of records when they were at the patient level (not distinct providers) was 1.129 million.  It was not immediately apparent what all parts of this were so I tried mine above as a fix to the error and it seemed to make a list just fine.

But then on the next step I have (modified after my modification above):

%macro param(parameter);

ods graphics / reset width=5in height=3in imagename"histogramOverlay";     /* any other name caused an error about imagename so I stopped trying to add more*/

proc sgrender data=heart template=histogram;

run;

%mend;

%param(&parameter);

This gives an exception and recommends I contact technical support.

Task name SGRENDER (2).

ERROR: Read Access Violation SGRENDER (2)

and it gives the location of the exception and task traceback.

Ideas?  Would better understanding the line in proc SQL solve this????

Thanks for the help!

Super Contributor
Posts: 252

Re: Overlay histograms - all data vs subset

That's fine.

Okay, I will explain the Proc SQL line by first explaining why I am using it.

Another way we could have selected and called the distinct weight_status is by using the code below. However if we used the code below, we would need to write the line %param(parameter, nicename) for every parameter. You mentioned you had over 1000 providers so that would mean that you would need to insert over 1000 lines into the code if you wanted to run it the way below. So instead I am using Proc SQL to generate the code that you would have needed to insert manually in order to create the plots for each provider.       

%macro param(parameter, nicename);

ods graphics / reset width=5in height=3in imagename="histogramOverlay_&nicename";
  proc sgrender data=heart template=histogram;
run;

%mend;

%param(Normal,Normal);                                                                                
%param(Overweight,Overweight);                                                                            
%param(Underweight,Underweight);

I believe that the statement below doesn't work for you because either your provider names are too long or you have special characters in the names such as %'s:

The first macro variable in the code below called paramter, write's the code that the macro uses to call all the providers. The second macro variable in the code called nicename, is the name that is used to output the imagename.

distinct cats("%","param(",weight_status, "," weight_status, ");"), weight_status into: parameter separated by " ",:nicename

You could use the substr function on your provider names (in this example weight_staus) to limit the amount of characters before hand, and the transwrd function to remove the special characters before hand.

Also you can change the image name from imagename="histogramOverlay_&nicename" to imagename="HO_&nicename" to reduce the number of characters too.

I hope this helps?

Post a Question
Discussion Stats
  • 8 replies
  • 1740 views
  • 5 likes
  • 4 in conversation