BookmarkSubscribeRSS Feed
NKormanik
Barite | Level 11

Row 1 is 1, 2, 3, 4, 5, 8 (representing days).

Row 2 are various values going out several decimal points.

 

I want to run a regression on Row 1 vs. Row 2.

 

I need the intercept and slope.  Ideally in columns following the values in Row 2.

 

There are many more rows besides Row 2.  Hundreds.  Each row needs a regression run, to find intercept and slope.

 

But Row 1 remains the same throughout all the regressions.

 

Since the SAS worksheet is already set up as described above it would be preferable to be able to use it as is.

 

Any possibilities going about this??

 

Help and suggestions greatly appreciated.

 

Nicholas

 

 

 

 

6 REPLIES 6
jklaverstijn
Rhodochrosite | Level 12

Hi Nicholas,

 

You will have to normalize your data before trying to do anything with it. No proc will ever be able to make sense of the current shape it's in. Regression is performed on values in rows, not columns. It is one of the common mistakes described in the excellent article "3 common messy data problems and how to tidy them in SAS".

 

Use a datastep to combine two consecutive rows and create an identifier for that pair of rows. Take care to rename the variables on the two reads so not to overwrite values. Then output every variable pair (from row1-var1, row2-var1 etc) plus the identifier to a new dataset. Then you can run your regression on that dataset using a BY on the identifier. If row 1 is not repeated (please provide sample data) then retain that first row and loop over the rest. It could in fact make the datastep easier. But we'd have to know. Please provide sample data before I can have a shot at coding.

 

And I do assume that what you call a SAS data sheet is in fact a SAS dataset.

 

Regards,

- Jan.

jklaverstijn
Rhodochrosite | Level 12

Oke so I couldn't resist myself. Here is one way to go about it:

 

data have;
input var1-var7;
cards;
1 2 3 4 5 6 8
0.0 0.1 0.2 0.3 0.4 0.5 0.6
1.1 1.2 1.2 1.3 1.3 1.4 1.4
;
run;

data want;
   if _n_=1 then set have(rename=(var1=day1 var2=day2 var3=day3 var4=day4 var5=day5 var6=day6 var7=day7));
   array days{*} day1-day7;
   array var{*} var1-var7;
   do until(eod);
        set have end=eod;
        id+1;
        do i=1 to 7;
           day=days{i};
           value=var{i};
           output;
        end;
   end;
   keep id day value;
run;
proc print data=want;
run;

You can run proc reg by id on dataset "want" and have reg create an output dataset.

 

And again, there is no way this can be done on your dataset as it is. You need to rearrange it.

 

Hope this helps,

- Jan.

data_null__
Jade | Level 19

You can also get the desired ouput by flipping the data and using GLM to fit the model.  This will handle missing values properly.

 

data wide;
   input y1-y5;
   cards;
60   61   62   63  65
3.1  3.6  3.8  4   4.1
4.5  3.4  2.3  5   6
3.5  .    2.3  5   6
;;;;
   run;
proc print;
   run;
proc transpose data=wide out=flip(drop=_name_ rename=(col1=X col2-col5=y1-y4));
   var y:;
   run;
proc print;
   run;
ods select none;
ods output ParameterEstimates=ParameterEstimates;
proc glm;
   model y: = x;
   run;
ods select all;
proc print;
   run;

You don't have to rename the new columns and just know that COL1 is X and the others are Y.

 

Capture.PNG

data_null__
Jade | Level 19

If you really only need slope and intercept you "can" do it in a data step.  This assumes NO missing values.  Proper handling of missing values if just one of may advantages to using PROC REG and with proper data structure.

 

data wide;
   input y1-y5;
   cards;
60   61   62   63  65
3.1  3.6  3.8  4   4.1
4.5  3.4  2.3  5   6
;;;;
   run;

data reg(keep=y: slope intercept);
   if _n_ eq 1 then set wide(rename=(y1-y5=x1-x5));
   array x[*] x:;
   set wide;
   array y[*] y:;
   array x_y[5];
   do i = 1 to dim(y);
      x_y[i] = x[i]*y[i];
      end;
   n     = n(of y[*]);
   sx    = sum(of x[*]);
   sy    = sum(of y[*]);
   sxy   = sum(of x_y[*]);
   sx2   = uss(of x[*]);
   slope = ((n*sxy)-(sx*sy)) / ((n*sx2)-sx**2);
   intercept = (sy - slope*sx) / n;
   run;
proc print;
   run;

Capture.PNG

NKormanik
Barite | Level 11

Because of the apparent complexity (for me, that is) I'm wondering if we could just take the one-row case.

 

Here is the first row, y:

 

    -1.6264     -0.2521      1.2354      1.2354       2.148       2.148

 

The x, then, is 1, 2, 3, 4, 5, 8

 

I'm hoping the code to do the transformation and regression will be more understandable.

 

Another related question:  Is there another SAS procedure that will provide a 'forecast' to a succeeding value.  Say, for instance, if x=10?  

 

Thanks to all here for the assistance.

 

Ksharp
Super User

It sounds that you are doing Time Series Analysis. 

Better post it at SAS Forecasting and Econometrics

There are tons of PROC can do that, such as PROC USM , PROC ARIMA ........

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 2370 views
  • 0 likes
  • 4 in conversation