Re: Regression data are across rows

NKormanik · Posted 06-11-2016 06:14 AM

Row 1 is 1, 2, 3, 4, 5, 8 (representing days).

Row 2 are various values going out several decimal points.

I want to run a regression on Row 1 vs. Row 2.

I need the intercept and slope. Ideally in columns following the values in Row 2.

There are many more rows besides Row 2. Hundreds. Each row needs a regression run, to find intercept and slope.

But Row 1 remains the same throughout all the regressions.

Since the SAS worksheet is already set up as described above it would be preferable to be able to use it as is.

Any possibilities going about this??

Help and suggestions greatly appreciated.

Nicholas

jklaverstijn · Posted 06-11-2016 07:28 AM

Hi Nicholas,

You will have to normalize your data before trying to do anything with it. No proc will ever be able to make sense of the current shape it's in. Regression is performed on values in rows, not columns. It is one of the common mistakes described in the excellent article "3 common messy data problems and how to tidy them in SAS".

Use a datastep to combine two consecutive rows and create an identifier for that pair of rows. Take care to rename the variables on the two reads so not to overwrite values. Then output every variable pair (from row1-var1, row2-var1 etc) plus the identifier to a new dataset. Then you can run your regression on that dataset using a BY on the identifier. If row 1 is not repeated (please provide sample data) then retain that first row and loop over the rest. It could in fact make the datastep easier. But we'd have to know. Please provide sample data before I can have a shot at coding.

And I do assume that what you call a SAS data sheet is in fact a SAS dataset.

Regards,

- Jan.

jklaverstijn · Posted 06-11-2016 07:58 AM

Oke so I couldn't resist myself. Here is one way to go about it:

data have;
input var1-var7;
cards;
1 2 3 4 5 6 8
0.0 0.1 0.2 0.3 0.4 0.5 0.6
1.1 1.2 1.2 1.3 1.3 1.4 1.4
;
run;

data want;
   if _n_=1 then set have(rename=(var1=day1 var2=day2 var3=day3 var4=day4 var5=day5 var6=day6 var7=day7));
   array days{*} day1-day7;
   array var{*} var1-var7;
   do until(eod);
        set have end=eod;
        id+1;
        do i=1 to 7;
           day=days{i};
           value=var{i};
           output;
        end;
   end;
   keep id day value;
run;
proc print data=want;
run;

You can run proc reg by id on dataset "want" and have reg create an output dataset.

And again, there is no way this can be done on your dataset as it is. You need to rearrange it.

Hope this helps,

- Jan.

data_null__ · Posted 06-11-2016 08:51 AM

You can also get the desired ouput by flipping the data and using GLM to fit the model. This will handle missing values properly.

data wide;
   input y1-y5;
   cards;
60   61   62   63  65
3.1  3.6  3.8  4   4.1
4.5  3.4  2.3  5   6
3.5  .    2.3  5   6
;;;;
   run;
proc print;
   run;
proc transpose data=wide out=flip(drop=_name_ rename=(col1=X col2-col5=y1-y4));
   var y:;
   run;
proc print;
   run;
ods select none;
ods output ParameterEstimates=ParameterEstimates;
proc glm;
   model y: = x;
   run;
ods select all;
proc print;
   run;

You don't have to rename the new columns and just know that COL1 is X and the others are Y.

data_null__ · Posted 06-11-2016 08:21 AM

If you really only need slope and intercept you "can" do it in a data step. This assumes NO missing values. Proper handling of missing values if just one of may advantages to using PROC REG and with proper data structure.

data wide;
   input y1-y5;
   cards;
60   61   62   63  65
3.1  3.6  3.8  4   4.1
4.5  3.4  2.3  5   6
;;;;
   run;

data reg(keep=y: slope intercept);
   if _n_ eq 1 then set wide(rename=(y1-y5=x1-x5));
   array x[*] x:;
   set wide;
   array y[*] y:;
   array x_y[5];
   do i = 1 to dim(y);
      x_y[i] = x[i]*y[i];
      end;
   n     = n(of y[*]);
   sx    = sum(of x[*]);
   sy    = sum(of y[*]);
   sxy   = sum(of x_y[*]);
   sx2   = uss(of x[*]);
   slope = ((n*sxy)-(sx*sy)) / ((n*sx2)-sx**2);
   intercept = (sy - slope*sx) / n;
   run;
proc print;
   run;

NKormanik · Posted 06-11-2016 07:00 PM

Because of the apparent complexity (for me, that is) I'm wondering if we could just take the one-row case.

Here is the first row, y:

-1.6264

-0.2521

1.2354

2.148

The x, then, is 1, 2, 3, 4, 5, 8

I'm hoping the code to do the transformation and regression will be more understandable.

Another related question: Is there another SAS procedure that will provide a 'forecast' to a succeeding value. Say, for instance, if x=10?

Thanks to all here for the assistance.

Ksharp · Posted 06-11-2016 09:45 PM

It sounds that you are doing Time Series Analysis.

Better post it at SAS Forecasting and Econometrics

There are tons of PROC can do that, such as PROC USM , PROC ARIMA ........