BookmarkSubscribeRSS Feed
AmirSari
Quartz | Level 8

Hi all,

 

I am having trouble doing a regression given a criteria for certain number of observations in a large dataset. I want to run a simple regression with one independent variable for 200 observations, after a condition is true and extract the coefficient of the independent variable.

Here is a sample of my data

 Id               date                         Y                            X           Condition

1             20120103            0.001421             0.017012

1             20120104            0.004966             -0.00138

1             20120105            0.011295             0.004055

1             20120106            0.003839             -0.000598

1             20120109            -0.01217              0.004418

1             20120110            0.001056              0.01111

1             20120111            0.005274              0.004571              

1             20120112            -0.013291             0.005077               1

1             20120113            -0.024814           -0.005005

1             20120117            0.02908               0.003767

1             20120118            0.041328             0.013701

1             20120119            -0.002374            0.005448

1             20120120            0.015981             0.003698

1             20120123            0.007363             0.001857     

 

For id=1 if condition=1 then I need to extract b in the regression y=a+bX for 100 observations before the condition and b in the regression y=a+bX for 100 observations (including the observation that contains the condition). I also want to exclude id numbers with less than 100 observations before or after the condition=1.

The desired output would be something like this.

Id               date                         Y                            X           Condition          slope

1             20120103            0.001421             0.017012                             -0.1777

1             20120104            0.004966             -0.00138                              -0.1777

1             20120105            0.011295             0.004055                             -0.1777

1             20120106            0.003839             -0.000598                            -0.1777

1             20120109            -0.01217              0.004418                             -0.1777

1             20120110            0.001056              0.01111                               -0.1777

1             20120111            0.005274              0.004571                             -0.1777

1             20120112            -0.013291             0.005077               1            3.1400

1             20120113            -0.024814            -0.005005                             3.1400

1             20120117            0.02908               0.003767                               3.1400

1             20120118            0.041328             0.013701                               3.1400

1             20120119            -0.002374            0.005448                               3.1400

1             20120120            0.015981             0.003698                               3.1400

1             20120123            0.007363             0.001857                               3.1400

 

In the above example the slopes are calculated for 7 observations before condition=1 and 8 observations after condition=1.

 

Any help would be greatly appreciated.

4 REPLIES 4
PaigeMiller
Diamond | Level 26

Modify your data set so you have some newly constructed variable which is present for every observation, and is sequential. So where you have ID 1 and condition is missing, your newly constructed variable has value 1. When condition has 1, the newly constructed variable has value 2. And so on. Then you can do the regression by ID and by the newly constructed variable.

--
Paige Miller
AmirSari
Quartz | Level 8
Any idea how to construct that variable?
Thanks!
PaigeMiller
Diamond | Level 26

UNTESTED CODE

 

Assumes data is properly sorted

 

data want;
    set have;
    by id date;
    if first.id or not missing(condition) then group+1;
run;
--
Paige Miller
AmirSari
Quartz | Level 8
Thanks!

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 693 views
  • 2 likes
  • 2 in conversation