Data visualization with SAS programming

Graphing Strategy using PROC SGPLOT XXXX

Reply
New Contributor
Posts: 2

Graphing Strategy using PROC SGPLOT XXXX

Hi all,

I am struggling with the most efficient way to graph my raw data along with a model curve. Here is the scenario:

I have raw data that look somewhat like this with multiple entities (ID) and essentially Y variable values for a 5 hour intervals for a ~30 24-hour periods that is unique for each ID. Note: Some of the 5 hr intervals cross the day boundary. For example, some start at 10PM to 2AM.

ID TIMESTAMP                           Date          Time       Y

1 03APR2015:00:00:00.000000 04/03/2015 00:00:00 3
1 03APR2015:01:00:00.000000 04/03/2015 01:00:00 2
1 03APR2015:02:00:00.000000 04/03/2015 02:00:00 1
1 03APR2015:03:00:00.000000 04/03/2015 03:00:00 1
1 03APR2015:04:00:00.000000 04/03/2015 04:00:00 0
1 04APR2015:00:00:00.000000 04/04/2015 00:00:00 0
1 04APR2015:01:00:00.000000 04/04/2015 01:00:00 0
1 04APR2015:02:00:00.000000 04/04/2015 02:00:00 3
1 04APR2015:03:00:00.000000 04/04/2015 03:00:00 0
1 04APR2015:04:00:00.000000 04/04/2015 04:00:00 2
. . .
. . .
. . .
2 03APR2015:05:00:00.000000 04/03/2015 05:00:00 0
2 03APR2015:06:00:00.000000 04/03/2015 06:00:00 2
2 03APR2015:07:00:00.000000 04/03/2015 07:00:00 1
2 03APR2015:08:00:00.000000 04/03/2015 08:00:00 1
2 03APR2015:09:00:00.000000 04/03/2015 09:00:00 1
. . .
. . .
. . .

 
Where TIMESTAMP is a SAS data/time variable, DATE is  SAS date variable, and TIME is a proper SAS time variable.

The goal is to create a number of plots for each ID that provide an appropriate visualization of these raw data along with a fitted model curve.

My initial plot looked like this (FYI: These plots are not from the sample data given above, but are also representative of the data in question):
 
Example1

And was created using this call to PROC SGPLOT:
 
PROC SGPLOT DATA = PLOT3 NOAUTOLEGEND ;
TITLE "ID = 1" ;
SCATTER X = TIMESTAMP Y = Y / MARKERATTRS = (SIZE = 10) ;
SERIES X = TIMESTAMP Y = Y / BREAK ;
RUN ;
 
===
 
HOWEVER, as you can see, this method includes a lot of unused white space as it retains space on the x-axis for all 24 hrs of the day while I am only interested in showing a particular 5 hr. interval for each 24 hr. period.
 
I am running 9.3, and as far as I can tell creating axis breaks is a rather involved process dealing with the GTL. Furthermore, the location of the breaks would be different for each ID, so I was not inclined to go that route (however, would be open to this if that is recommended).
 
Instead what I did was to create a new character variable that had a tag which identified the interval number (roughly 1-30) and the lag number (4 - 0 in descending order). I then created this set of plots which worked really well for the raw data:
 
 
Here is my code for this one:
 
PROC SGPLOT DATA = PLOT3 NOAUTOLEGEND ;
SCATTER X = LAG_VAR Y = Y / MARKERATTRS = (SIZE = 10) ;
SERIES X = LAG_VAR Y = Y / BREAK ;
YAXIS VALUES=(&YMIN TO &YMAX BY &YBY) VALUESHINT MIN=&YMIN2 MAX=&YMAX2 ;
LABEL LAG_VAR ="Interval/Lag" ;
REFLINE &VLINES / DISCRETEOFFSET=.5 AXIS = X LINEATTRS=(COLOR=RED PATTERN=MEDIUMDASH) ;
RUN ;
 
===
 
As you can see, I wrote a macro around this call to SGPLOT to customize features of the plot for each. Each plot "holds" 4 5-hr intervals, so there are multiple plots per ID, but that is not a problem.
 
Here is the problem: I now have a model fit for each interval for each ID and I need to add the fitted curves to the example2 plots. However, the x-axis is character in the example2 plots as of right now. The model is curvlinear, so I usually create a series of x variable values and predicted y values using a DO loop and the model equation (so that the curve is smooth and does not jump around due to a limited number of discrete x variables fed to the model equation), however, I'm not sure how to line a series of x values with the character x-axis that I already have. I was thinking about using the X2AXIS statement, but think that lining this up with what I already have is going to be very tricky.
 
Any ideas? Any assistance is greatly appreciated.
 
Best,
 
Dan
SAS Super FREQ
Posts: 1,044

Re: Graphing Strategy using PROC SGPLOT XXXX

I suggest use "BY=date;"  The graphs should get separated by date, and only the real data range will be shown.  Or use SGPANEL, with PANELBY date.

 

General comment - Including a full running program (with sample data) makes it easier to help.  Also include the SAS version you are using.

New Contributor
Posts: 2

Re: Graphing Strategy using PROC SGPLOT XXXX

Hi Sanjay,

 

How would the BY=date approach work for 5 hour intervals that span the day boundary? The example from my original post was :

 

"Some of the 5 hr intervals cross the day boundary. For example, some start at 10PM to 2AM."

 

Also, ideally, I would like 4 days per plot.

 

Is there an easy way to specify 3 separate x-axis breaks for each plot in 9.3? There would be ~2,3183 plots where the 3 x-axis breaks would be different for each plot.

 

Thank you,

 

Dan

 

SAS Super FREQ
Posts: 864

Re: Graphing Strategy using PROC SGPLOT XXXX

Sanjay's BY-groiuping trick could still work for you. Create two indexing columns in your data. One index column where each unique index (1, 2, 3, etc.) is assigned to the observations you want to contain in each plot. The other index column should be the string column you mentioned that contains the intervals. Set OPTIONS NOBYLINE to prevent the BY information from showing up in a title. Then, just use the index column as your BY variable, and the interval index on the PANELBY statement of SGPANEL. It will look something like this:

 

proc sgpanel data=mydata;

by index;

panelby interval / layout=columnlattice uniscale=row onepanel;.

series x=xvar y=yvar;

run;

 

Hope this helps!

Dan

Post a Question
Discussion Stats
  • 3 replies
  • 304 views
  • 0 likes
  • 3 in conversation