Hi all,
I am having trouble doing a regression given a criteria for certain number of observations in a large dataset. I want to run a simple regression with one independent variable for 200 observations, after a condition is true and extract the coefficient of the independent variable.
Here is a sample of my data
Id date Y X Condition
1 20120103 0.001421 0.017012
1 20120104 0.004966 -0.00138
1 20120105 0.011295 0.004055
1 20120106 0.003839 -0.000598
1 20120109 -0.01217 0.004418
1 20120110 0.001056 0.01111
1 20120111 0.005274 0.004571
1 20120112 -0.013291 0.005077 1
1 20120113 -0.024814 -0.005005
1 20120117 0.02908 0.003767
1 20120118 0.041328 0.013701
1 20120119 -0.002374 0.005448
1 20120120 0.015981 0.003698
1 20120123 0.007363 0.001857
For id=1 if condition=1 then I need to extract b in the regression y=a+bX for 100 observations before the condition and b in the regression y=a+bX for 100 observations (including the observation that contains the condition). I also want to exclude id numbers with less than 100 observations before or after the condition=1.
The desired output would be something like this.
Id date Y X Condition slope
1 20120103 0.001421 0.017012 -0.1777
1 20120104 0.004966 -0.00138 -0.1777
1 20120105 0.011295 0.004055 -0.1777
1 20120106 0.003839 -0.000598 -0.1777
1 20120109 -0.01217 0.004418 -0.1777
1 20120110 0.001056 0.01111 -0.1777
1 20120111 0.005274 0.004571 -0.1777
1 20120112 -0.013291 0.005077 1 3.1400
1 20120113 -0.024814 -0.005005 3.1400
1 20120117 0.02908 0.003767 3.1400
1 20120118 0.041328 0.013701 3.1400
1 20120119 -0.002374 0.005448 3.1400
1 20120120 0.015981 0.003698 3.1400
1 20120123 0.007363 0.001857 3.1400
In the above example the slopes are calculated for 7 observations before condition=1 and 8 observations after condition=1.
Any help would be greatly appreciated.
Modify your data set so you have some newly constructed variable which is present for every observation, and is sequential. So where you have ID 1 and condition is missing, your newly constructed variable has value 1. When condition has 1, the newly constructed variable has value 2. And so on. Then you can do the regression by ID and by the newly constructed variable.
UNTESTED CODE
Assumes data is properly sorted
data want;
set have;
by id date;
if first.id or not missing(condition) then group+1;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.