I am trying to draw a number of histograms, using SGPANEL and PANELBY (SAS 9.4M4 within EG). I would also like to add, to each histogram, a vertical REFLINE showing the location of the 80th percentile for each of the BY groups. I have these values stored in my dataset in a variable called P80.
PROC SGPANEL DATA=example; PANELBY a_type / COLUMNS=2 ROWS=3 SPACING=10; HISTOGRAM a_quantity; REFLINE P80 / AXIS = X LEGENDLABEL="P80" NAME="pline"; KEYLEGEND "pline" / POSITION=TOP; RUN;
My problem is that my code to draw the graphs and save them to a PDF, which runs in a couple minutes without the REFLINE or with the REFLINE set to a constant value, takes upwards of half an hour with the REFLINE referring to a variable. I think it is related to the size of my dataset: around 25 BY groups, with around 100k observations in each group. My guess is that by calling REFLINE with reference to a variable, SAS is checking the value for each observation, even though I only want it to draw one line. It doesn't seem to make any difference if I only store the P80 value once per BY group.
Is there a way I can tell SAS I only need one line per panel? Or some other way to speed this up? I would really like to be able to keep multiple panels per page if possible.
I don't know how to get the program to go faster, but here is one suggestion. You are probably plotting 100k reference lines for every cell in the panel. You probably merged the percentiles and the data like this:
data Have; call streaminit(1); do type = 1 to 6; do i = 1 to 100000; x = rand("Lognormal"); output; end; end; run; proc means data=Have noprint; by type; var x; output out=Pctl p80=p80; run; /* 100k reflines drawn for each cell */ data example; merge Have Pctl; by Type; run; PROC SGPANEL DATA=example; PANELBY type / COLUMNS=2 ROWS=3 SPACING=10; HISTOGRAM x; REFLINE P80 / AXIS = X LEGENDLABEL="P80" NAME="pline"; KEYLEGEND "pline" / POSITION=TOP; RUN;
Instead, set the P80 variable to missing except for one observation. That will cause only one reference line to be drawn, like this:
/* 1 refline drawn for each cell */ data example; merge Have Pctl; by Type; if NOT first.Type then P80=.; run; PROC SGPANEL DATA=example; PANELBY type / COLUMNS=2 ROWS=3 SPACING=10; HISTOGRAM x; REFLINE P80 / AXIS = X LEGENDLABEL="P80" NAME="pline"; KEYLEGEND "pline" / POSITION=TOP; RUN;
The program still have to look at every observation to see if there is a valid refline value, but only one value is actually drawn.
Yeah, I had tried that but it didn't seem to make much of a difference. What did eventually seem to work was removing the PANELBY statement and using an explicit loop through each of my groups. Then I could grab the appropriate value for each group and set the REFLINE to that value rather than a variable. This brought my time back down to a couple minutes per run.
Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.
If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.