Hello SAS programmers!
I am working with monthly stock data from Feb 1980 to Dec 2016.
The dataset has roughly 450,000 stock-month
What I want to do is to create a variable which represents consecutive runs of months (within stocks and between stocks)
and remove all stock-month observations less than 6months runs.
If you check following data structure you'll catch this.
stock date run
1 2000JAN 1
1 2000FEB 2
1 2000MAR 3
1 2000APR 4
1 2000AUG 1
1 2000SEP 2
1 2000DEC 1
1 2001JAN 2
1 2001FEB 3
2 2000JAN 1
2 2000FEB 2
2 2000MAR 3
2 2000APR 4
2 2000AUG 1
2 2000SEP 2
2 2000DEC 1
2 2001JAN 2
2 2001FEB 3
I want to fix my dataset as following(Deleting stock-month block with maximum run<3).
stock date run
1 2000JAN 1
1 2000FEB 2
1 2000MAR 3
1 2000APR 4
1 2000DEC 1
1 2001JAN 2
1 2001FEB 3
2 2000JAN 1
2 2000FEB 2
2 2000MAR 3
2 2000APR 4
2 2000DEC 1
2 2001JAN 2
2 2001FEB 3
In this situation, how can I fix my dataset and variable "run"??
Please notice that I have 450,000 obs of full dataset so please give me general approach to do this.
THX.
If i understood your req:
data have;
set have;
by stock;
if first.stock then grp=0;
if run=1 then grp+1;
run;
data want;
do n=1 by 1 until(last.grp);
set have;
by stock grp;
end;
do until(last.grp);
set have;
by stock grp;
if n>=3 then output;
end;
drop n grp;
run;
Regards,
Naveen Srinivasan
Is your date variable formatted in this way?
My dates are the last days of the month in given year,
for example,
30NOV1980
31DEC1980
31JAN1981
28FEB1981
Ok. Not sure I understand, eg these lines
1 2000OCT 1
1 2000SEP 2
Why would your run variable increase in this case?
Oh...sorry... TYPO..
it's August...I corrected it
Makes more sense. I have implemented your run variable, though I am not certain aboyut the restrictions to put on it. Here I simply output only if run < 3.
data have;
input stock$ date:date9.;
format date date9.;
datalines;
1 31JAN2000
1 28FEB2000
1 31MAR2000
1 30APR2000
1 31AUG2000
1 30SEP2000
1 31DEC2000
1 31JAN2001
1 28FEB2001
2 31JAN2000
2 28FEB2000
2 31MAR2000
2 30APR2000
2 31AUG2000
2 30SEP2000
2 31DEC2000
2 31JAN2001
2 28FEB2001
;
data want;
set have;
if intck('month',lag1(date),date) = 1 then run + 1;
else run = 1;
retain run 1;
if run <3; /* Subsetting if */
run;
If i understood your req:
data have;
set have;
by stock;
if first.stock then grp=0;
if run=1 then grp+1;
run;
data want;
do n=1 by 1 until(last.grp);
set have;
by stock grp;
end;
do until(last.grp);
set have;
by stock grp;
if n>=3 then output;
end;
drop n grp;
run;
Regards,
Naveen Srinivasan
This is exactly what I want.
Based on draycut's argument, this code completes the procedure
Thank you very much!!! I really appreciate it!
data want;
set have;
by stock;
retain count;
if first.stock then count=.;
if run=1 then count+1;
run;
data want2(where=(count2>=3));
do until(last.count);
set want;
by stock count;
retain count2;
if first.count then count2=1;
else count2+1;
end;
do until(last.count);
set want;
by stock count;
output;
end;
drop count count2;
run;
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.