Data visualization with SAS programming

sgplot with multiple regression lines

Reply
Contributor
Posts: 24

sgplot with multiple regression lines

Hi,

I am trying to create a plot for my data but I am struggling to get the expected output. The proc sgplot produces the graph with 50 regression lines for the 50 groups in my datasets. My problem is that the lines do not always go to the end of the X axis. I would like to have all the lines stop at the same x. I think my issue is due to the fact that hte max(x) in my dataset is different for each group. Sometimes it is max(x)=50, sometimes max(x)=80....

Is there any way to have SAS draw the regression line for a specific interval. I thought about drawing a line using hte intercept and slope but I am not sure how to do it.

proc sgplot data=listall;

reg x=var1 y= var2  /  NOMARKERS group=t   ;

quit;

Thanks a lot for any help

Super Contributor
Posts: 1,636

Re: sgplot with multiple regression lines

could you first find out the minimum max value of by group, then do something like if x>50 then x=50?

Contributor
Posts: 24

Re: sgplot with multiple regression lines

THanks, I thought about it but I am also plotting the data on my scatter plot. If I add a value in my data set it will output a point which does not exists

SAS Super FREQ
Posts: 1,044

Re: sgplot with multiple regression lines

SGPLOT will always use all the data provided to the regression statement to compute the fit line.  There is no way to draw only part of the fit line.  One possibility would be to compute a new Y2 column that has missing values for x>50.  Then, provide Y2 to the reg statement, while you still provide the original Y column to the scatter plot.  However, this may not be correct since the new regression line will not be the same as the one with full data.

Now, SGPLOT computes a new data set with the values needed to draw the regression line as a series plot.  So, you could use ODS Output to get this computed data set, remove the points of the fit line for x > 50, and then use a SERIES statement to draw the truncated fit line.

Note, this idea works with degree=2, as many points are computed.  With default degree=1, only the two end points are computed, so that would get tricky.  Here is a program using class data to illustrate the idea:

ods output sgplot=Reg;
proc sgplot data=sashelp.class;
  reg x=height y=weight / group=sex degree=2;
  run;

proc print data=reg;run;

data reg2;
  set reg;
  y2=REGRESSION_HEIGHT_WEIGHT_GROU__Y;
  if REGRESSION_HEIGHT_WEIGHT_GROU__X >65 then y2=.;
  run;
proc print data=reg2;run;

proc sgplot data=reg2;
  scatter x=height y=weight / group=sex;
  series x=REGRESSION_HEIGHT_WEIGHT_GROU__X y=y2  /

            group=REGRESSION_HEIGHT_WEIGHT_GROU_GP;
  run;

SAS Super FREQ
Posts: 3,234

Re: sgplot with multiple regression lines

I think Sanjay has the right idea, but you should do the analysis in PROC GLM and then plot the predicted curves overlayed on the data.

Here is a ropugh outline:

1) Find the global max and min of the x variable. Save those values in macro variables. For example:

proc sql noprint;

select min(x) into :min from MyData;

select max(x) into :max from MyData;

quit;

2) Construct a new data set, A,  that consists of 2 obs for each level of the GROUP variable. For example:

data A;

Group=1;

x=&min; y=.; output;

x=&max; y=.; output;

Group=2;

x=&min; y=.; output;

x=&max; y=.; output;

...

run;

You probably want to program this step.

3) Concatenate the original data and A.  Call the new data set B.  Run PROC GLM on B and use the OUTPUT statement to get linear predictions.  BECAUSE THE Y VARIABLE IS MISSING, the observations from A are not used in the model estimation, but they DO receive predicted values.

4) Use PROC SGPLOT to plot a scatter plot of the observations and a series plot of the predicted values.  The scatter plot does not contain any points from A because the Y value is missing. The series plot contains all points from A because the predicted values are nonmissing.

Contributor
Posts: 24

Re: sgplot with multiple regression lines

Thanks everyone for your great suggestion.

I ended up using GTL and drawing lines using intercept and slope.I followed this logic:

- Calculated the slope and intersect by group . It gaves me 50 records for my 50 experiment

- Set the dataset with all the data wit hthe data set with slope and intercept

- Created a graph template to plot the data and draw the regression line using intercept and slope

Thanks again

SAS Super FREQ
Posts: 1,044

Re: sgplot with multiple regression lines

If you compute the fit parameters yourself (x, y) and slope, you could just as  easily compute the second point and for each group, and use SGPLOT series statement to get the same result.  You may have more options with a Series instead of LineParm.

Post a Question
Discussion Stats
  • 6 replies
  • 2051 views
  • 0 likes
  • 4 in conversation