BookmarkSubscribeRSS Feed
AmirSari
Quartz | Level 8

Hi all,

 

I am having trouble doing a regression given a criteria for certain number of observations in a large dataset. I want to run a simple regression with one independent variable for 200 observations, after a condition is true and extract the coefficient of the independent variable.

Here is a sample of my data

 Id               date                         Y                            X           Condition

1             20120103            0.001421             0.017012

1             20120104            0.004966             -0.00138

1             20120105            0.011295             0.004055

1             20120106            0.003839             -0.000598

1             20120109            -0.01217              0.004418

1             20120110            0.001056              0.01111

1             20120111            0.005274              0.004571              

1             20120112            -0.013291             0.005077               1

1             20120113            -0.024814           -0.005005

1             20120117            0.02908               0.003767

1             20120118            0.041328             0.013701

1             20120119            -0.002374            0.005448

1             20120120            0.015981             0.003698

1             20120123            0.007363             0.001857     

 

For id=1 if condition=1 then I need to extract b in the regression y=a+bX for 100 observations before the condition and b in the regression y=a+bX for 100 observations (including the observation that contains the condition). I also want to exclude id numbers with less than 100 observations before or after the condition=1.

The desired output would be something like this.

Id               date                         Y                            X           Condition          slope

1             20120103            0.001421             0.017012                             -0.1777

1             20120104            0.004966             -0.00138                              -0.1777

1             20120105            0.011295             0.004055                             -0.1777

1             20120106            0.003839             -0.000598                            -0.1777

1             20120109            -0.01217              0.004418                             -0.1777

1             20120110            0.001056              0.01111                               -0.1777

1             20120111            0.005274              0.004571                             -0.1777

1             20120112            -0.013291             0.005077               1            3.1400

1             20120113            -0.024814            -0.005005                             3.1400

1             20120117            0.02908               0.003767                               3.1400

1             20120118            0.041328             0.013701                               3.1400

1             20120119            -0.002374            0.005448                               3.1400

1             20120120            0.015981             0.003698                               3.1400

1             20120123            0.007363             0.001857                               3.1400

 

In the above example the slopes are calculated for 7 observations before condition=1 and 8 observations after condition=1.

 

Any help would be greatly appreciated.

4 REPLIES 4
PaigeMiller
Diamond | Level 26

Modify your data set so you have some newly constructed variable which is present for every observation, and is sequential. So where you have ID 1 and condition is missing, your newly constructed variable has value 1. When condition has 1, the newly constructed variable has value 2. And so on. Then you can do the regression by ID and by the newly constructed variable.

--
Paige Miller
AmirSari
Quartz | Level 8
Any idea how to construct that variable?
Thanks!
PaigeMiller
Diamond | Level 26

UNTESTED CODE

 

Assumes data is properly sorted

 

data want;
    set have;
    by id date;
    if first.id or not missing(condition) then group+1;
run;
--
Paige Miller
AmirSari
Quartz | Level 8
Thanks!

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 715 views
  • 2 likes
  • 2 in conversation