BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
FredGIII
Quartz | Level 8

Hi everyone,

Does anyone know of a straightforward way to overlay histograms where the two histograms are not grouped but one represents the histogram of the entire dataset and the second histogram represents a subset of the dataset? 

My data contains columns that include Location, ParameterType and ParameterValue.  So I am trying to compare the distribution of the values overall vs a particular location.  I can do this as separate graphs with the following code:

proc univariate data=myData (where=(location = &currentloc and ParameterType = &ParamSelect)) noprint;

     histogram ParameterValue / barlabel = count endpoints=&startpt to &endpt;

run;

proc univariate data=myData (where=(ParameterType = &ParamSelect)) noprint;

     histogram ParameterValue / barlabel = count endpoints=&startpt to &endpt;

run;

This gives me the correct plots but plotted separately.  I can use greplay to get the two plots grouped together, but I would like to overlay the two on the same plot. I have been searching and there are plenty of examples using overlay when you use different variables but not from different groupings of the same variable. Any suggestions?

Thanks!!

Fred

1 ACCEPTED SOLUTION

Accepted Solutions
Jay54
Meteorite | Level 14

Yes, you can - using GTL.  In GTL you can use the EVAL statement to extract a subset on the fly from the same column.  You could use something like the following.  But, technically this is the same as creating your own 2nd column as that is what GTL is doing for you behind the scene.

proc template;

  define statgraph histogram;

    begingraph;

      layout overlay / cycleattrs=true;

       histogram systolic / name='a' datatransparency=0.5 legendlabel='All' binstart=80 binwidth=10 binaxis=false;

       histogram eval(ifn(weight_status eq 'Overweight', systolic, .)) / name='b' binaxis=false

                  datatransparency=0.5 legendlabel='Overweight' binstart=80 binwidth=10;

        discretelegend 'a' 'b' / location=inside autoalign=(topright) across=1;

       endlayout;

    endgraph;

  end;

run;

ods graphics / reset width=5in height=3in imagename='histogramOverlay';

proc sgrender data=sashelp.heart template=histogram;

run;

histogramOverlay.png

View solution in original post

8 REPLIES 8
Jay54
Meteorite | Level 14

If you have two separate columns in a data set, you can do the following using SGPLOT procedure.

OverlayHistogramHeart.png

http://blogs.sas.com/content/graphicallyspeaking/2013/11/21/comparative-histograms/

proc sgplot data=sashelp.heart;

  title 'Distribution of Blood Pressure';

  histogram diastolic / binstart=50 binwidth=5 transparency=0.5;

  histogram systolic / binstart=50 binwidth=5  transparency=0.5;

  xaxis display=(nolabel);

  yaxis grid;

  keylegend / location=inside position=topright across=1;

  run;

FredGIII
Quartz | Level 8

Hi Sanjay,

I am aware of this approach, but if you look carefully at my code above, the data that I am creating histograms from are from one column. The difference is that one histogram represents all the data in the column, and the second histogram represents a subset based on location. I could create another datastep that will create two columns and that may be the easiest.  I was just hoping there would be a better approach that would allow me to overlay based on all vs subset of the same column.

Thanks,

Fred

Jay54
Meteorite | Level 14

Yes, you can - using GTL.  In GTL you can use the EVAL statement to extract a subset on the fly from the same column.  You could use something like the following.  But, technically this is the same as creating your own 2nd column as that is what GTL is doing for you behind the scene.

proc template;

  define statgraph histogram;

    begingraph;

      layout overlay / cycleattrs=true;

       histogram systolic / name='a' datatransparency=0.5 legendlabel='All' binstart=80 binwidth=10 binaxis=false;

       histogram eval(ifn(weight_status eq 'Overweight', systolic, .)) / name='b' binaxis=false

                  datatransparency=0.5 legendlabel='Overweight' binstart=80 binwidth=10;

        discretelegend 'a' 'b' / location=inside autoalign=(topright) across=1;

       endlayout;

    endgraph;

  end;

run;

ods graphics / reset width=5in height=3in imagename='histogramOverlay';

proc sgrender data=sashelp.heart template=histogram;

run;

histogramOverlay.png

FredGIII
Quartz | Level 8

Thanks Sanjay,

That's exactly what I was looking for. While it is essentially the same thing as the first approach, the GTL approach lets me set up a template for all the plots and then loop through locations. I think it is a more elegant approach.

I appreciate the help!

ch
Calcite | Level 5 ch
Calcite | Level 5

Hi Fred and Sanjay,

This helps tremendously with a project that I am working on.  I have an 'all' data sets which includes a column/variable with the ages of a patient population. Another column has the provider id (this can be repeated because the table is a patient level table).  In this example, you've allowed me to compare the age distribution of the overall population with the age distribution of a single provider.

However, I have thousands of distinct providers!  I'm wondering how you 'looped' through all your locations.  I tried putting the distinct list of providers into a macro variable but I would still have to call each one.  How did Fred, how did you 'loop' through your locations?  I would like to know how to modify the following statment to loop through all the distinct values in a particular column:

histogram eval(ifn(weight_status eq 'Overweight', systolic,.))

I recognize this will make thousands of histograms, but that is what is desired.

Thanks so much for any help you can provide!

djrisks
Barite | Level 11

Hello,

I've modified Sanjay's original code to show how you can cycle through all your locations using Macro Variables.

proc template;
  define statgraph histogram;
    begingraph;
mvar parameter; /* Creating Macro Variable, so that distinct paramters can be called */
entrytitle parameter;
      layout overlay / cycleattrs=true;

       histogram systolic / name='a' datatransparency=0.5 legendlabel='All' binstart=80 binwidth=10 binaxis=false;
       histogram eval(ifn(weight_status eq parameter, systolic, .)) / name='b' binaxis=false
                  datatransparency=0.5 legendlabel= parameter binstart=80 binwidth=10;

        discretelegend 'a' 'b' / location=inside autoalign=(topright) across=1;
       endlayout;
    endgraph;
  end;
run;


proc sort data = sashelp.heart out = heart;
by weight_status;
run;

/* Obtaining distinct parameters */

proc sql;
  select distinct cats("%","param(",weight_status, "," ,weight_status, ");"), weight_status into: parameter separated by " " ,: nicename
  from heart
  where weight_status ne "";
quit;


%macro param(parameter, nicename);

ods graphics / reset width=5in height=3in imagename="histogramOverlay_&nicename";
  proc sgrender data=heart template=histogram;
run;

%mend;

/* Calling the distinct parameters */

&parameter;

ch
Calcite | Level 5 ch
Calcite | Level 5

Thanks for the reply!

I'm having a hard time getting the code to run.  I ended up switching the proc sql to look like:

proc sql;

     select distinct weight_status into : parameter separated by " "

     from heart

     where weight_status ne "";

quit;

Could you further explain the line you suggested?

select distinct cats("%","param(",weight_status, "," weight_status, ");"), weight_status into: parameter separated by " ",:nicename

I received a character limit exceeded error on this.  Not surprising given the number of distinct providers (each with 9-10 digit ids) was 3,570 but the number of records when they were at the patient level (not distinct providers) was 1.129 million.  It was not immediately apparent what all parts of this were so I tried mine above as a fix to the error and it seemed to make a list just fine.

But then on the next step I have (modified after my modification above):

%macro param(parameter);

ods graphics / reset width=5in height=3in imagename"histogramOverlay";     /* any other name caused an error about imagename so I stopped trying to add more*/

proc sgrender data=heart template=histogram;

run;

%mend;

%param(&parameter);

This gives an exception and recommends I contact technical support.

Task name SGRENDER (2).

ERROR: Read Access Violation SGRENDER (2)

and it gives the location of the exception and task traceback.

Ideas?  Would better understanding the line in proc SQL solve this????

Thanks for the help!

djrisks
Barite | Level 11

That's fine.

Okay, I will explain the Proc SQL line by first explaining why I am using it.

Another way we could have selected and called the distinct weight_status is by using the code below. However if we used the code below, we would need to write the line %param(parameter, nicename) for every parameter. You mentioned you had over 1000 providers so that would mean that you would need to insert over 1000 lines into the code if you wanted to run it the way below. So instead I am using Proc SQL to generate the code that you would have needed to insert manually in order to create the plots for each provider.       

%macro param(parameter, nicename);

ods graphics / reset width=5in height=3in imagename="histogramOverlay_&nicename";
  proc sgrender data=heart template=histogram;
run;

%mend;

%param(Normal,Normal);                                                                                
%param(Overweight,Overweight);                                                                            
%param(Underweight,Underweight);

I believe that the statement below doesn't work for you because either your provider names are too long or you have special characters in the names such as %'s:

The first macro variable in the code below called paramter, write's the code that the macro uses to call all the providers. The second macro variable in the code called nicename, is the name that is used to output the imagename.

distinct cats("%","param(",weight_status, "," weight_status, ");"), weight_status into: parameter separated by " ",:nicename

You could use the substr function on your provider names (in this example weight_staus) to limit the amount of characters before hand, and the transwrd function to remove the special characters before hand.

Also you can change the image name from imagename="histogramOverlay_&nicename" to imagename="HO_&nicename" to reduce the number of characters too.

I hope this helps?

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 6651 views
  • 6 likes
  • 4 in conversation