BookmarkSubscribeRSS Feed
AmirSari
Quartz | Level 8

Hi all,

 

I am having trouble doing a regression given a criteria for certain number of observations in a large dataset. I want to run a simple regression with one independent variable for 200 observations, after a condition is true and extract the coefficient of the independent variable.

Here is a sample of my data

 Id               date                         Y                            X           Condition

1             20120103            0.001421             0.017012

1             20120104            0.004966             -0.00138

1             20120105            0.011295             0.004055

1             20120106            0.003839             -0.000598

1             20120109            -0.01217              0.004418

1             20120110            0.001056              0.01111

1             20120111            0.005274              0.004571              

1             20120112            -0.013291             0.005077               1

1             20120113            -0.024814           -0.005005

1             20120117            0.02908               0.003767

1             20120118            0.041328             0.013701

1             20120119            -0.002374            0.005448

1             20120120            0.015981             0.003698

1             20120123            0.007363             0.001857     

 

For id=1 if condition=1 then I need to extract b in the regression y=a+bX for 100 observations before the condition and b in the regression y=a+bX for 100 observations (including the observation that contains the condition). I also want to exclude id numbers with less than 100 observations before or after the condition=1.

The desired output would be something like this.

Id               date                         Y                            X           Condition          slope

1             20120103            0.001421             0.017012                             -0.1777

1             20120104            0.004966             -0.00138                              -0.1777

1             20120105            0.011295             0.004055                             -0.1777

1             20120106            0.003839             -0.000598                            -0.1777

1             20120109            -0.01217              0.004418                             -0.1777

1             20120110            0.001056              0.01111                               -0.1777

1             20120111            0.005274              0.004571                             -0.1777

1             20120112            -0.013291             0.005077               1            3.1400

1             20120113            -0.024814            -0.005005                             3.1400

1             20120117            0.02908               0.003767                               3.1400

1             20120118            0.041328             0.013701                               3.1400

1             20120119            -0.002374            0.005448                               3.1400

1             20120120            0.015981             0.003698                               3.1400

1             20120123            0.007363             0.001857                               3.1400

 

In the above example the slopes are calculated for 7 observations before condition=1 and 8 observations after condition=1.

 

Any help would be greatly appreciated.

4 REPLIES 4
PaigeMiller
Diamond | Level 26

Modify your data set so you have some newly constructed variable which is present for every observation, and is sequential. So where you have ID 1 and condition is missing, your newly constructed variable has value 1. When condition has 1, the newly constructed variable has value 2. And so on. Then you can do the regression by ID and by the newly constructed variable.

--
Paige Miller
AmirSari
Quartz | Level 8
Any idea how to construct that variable?
Thanks!
PaigeMiller
Diamond | Level 26

UNTESTED CODE

 

Assumes data is properly sorted

 

data want;
    set have;
    by id date;
    if first.id or not missing(condition) then group+1;
run;
--
Paige Miller
AmirSari
Quartz | Level 8
Thanks!

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 941 views
  • 2 likes
  • 2 in conversation