Hi, SAS community !
I always appreciate your help-out. I am knocking on your doors to get another little help from you.
Thank you in advance, after all.
The dataset I have looks like this:
data have;
input cusip $ year month analysts;
datalines;
000255 2004 1 .
000255 2004 2 .
000255 2004 3 1
000255 2004 4 .
000255 2004 5 .
000255 2004 6 .
000255 2004 7 .
000255 2004 8 .
000255 2004 9 .
000255 2004 10 .
000255 2004 11 .
000255 2004 12 .
000255 2005 1 .
000255 2005 2 .
000255 2005 3 .
000255 2005 4 2
000255 2005 5 2
000255 2005 6 .
000255 2005 7 1
000255 2005 8 .
000255 2005 9 .
000255 2005 10 .
000255 2005 11 .
000255 2005 12 .
000307 2015 1 .
000307 2015 2 1
000307 2015 3 1
000307 2015 4 2
000307 2015 5 3
000307 2015 6 .
000307 2015 7 3
000307 2015 8 3
000307 2015 9 .
000307 2015 10 5
000307 2015 11 .
000307 2015 12 .
;
run;
This data contains the info of how many analysts cover a firm (cusip) in a certain year-month.
One sure thing about this data is that each year has twelve months. I designed it so.
What I want to have is, missing obs. are filled up by the average value of previous and next obs.
Here, firm-year is a chunk. Thus, a firm-year that starts or ends missing should be filled by the first or last non-missing values.
Therefore, I expect to achieve the following:
data have;
input cusip $ year month analysts;
datalines;
000255 2004 1 1
000255 2004 2 1
000255 2004 3 1
000255 2004 4 1
000255 2004 5 1
000255 2004 6 1
000255 2004 7 1
000255 2004 8 1
000255 2004 9 1
000255 2004 10 1
000255 2004 11 1
000255 2004 12 1
000255 2005 1 2
000255 2005 2 2
000255 2005 3 2
000255 2005 4 2
000255 2005 5 2
000255 2005 6 1.5
000255 2005 7 1
000255 2005 8 1
000255 2005 9 1
000255 2005 10 1
000255 2005 11 1
000255 2005 12 1
000307 2015 1 1
000307 2015 2 1
000307 2015 3 1
000307 2015 4 2
000307 2015 5 3
000307 2015 6 3
000307 2015 7 3
000307 2015 8 3
000307 2015 9 4
000307 2015 10 5
000307 2015 11 5
000307 2015 12 5
;
run;
I hope my question were not so stupid.
Thank you all!
Sincerely,
KS -,
@KS99 try this
data have;
input cusip $ year month analysts;
datalines;
000255 2004 1 .
000255 2004 2 .
000255 2004 3 1
000255 2004 4 .
000255 2004 5 .
000255 2004 6 .
000255 2004 7 .
000255 2004 8 .
000255 2004 9 .
000255 2004 10 .
000255 2004 11 .
000255 2004 12 .
000255 2005 1 .
000255 2005 2 .
000255 2005 3 .
000255 2005 4 2
000255 2005 5 2
000255 2005 6 .
000255 2005 7 1
000255 2005 8 .
000255 2005 9 .
000255 2005 10 .
000255 2005 11 .
000255 2005 12 .
000307 2015 1 .
000307 2015 2 1
000307 2015 3 1
000307 2015 4 2
000307 2015 5 3
000307 2015 6 .
000307 2015 7 3
000307 2015 8 3
000307 2015 9 .
000307 2015 10 5
000307 2015 11 .
000307 2015 12 .
;
data want(drop = a _a m);
_a = a;
do _N_ = 1 by 1 until (last.year | a);
set have(rename = analysts = a);
by cusip year;
end;
m = mean(_a, a);
do _N_ = 1 to _N_;
set have(rename = analysts = a);
analysts = coalesce(a, m);
output;
end;
run;
Result:
cusip year month analysts 000255 2004 1 1 000255 2004 2 1 000255 2004 3 1 000255 2004 4 1 000255 2004 5 1 000255 2004 6 1 000255 2004 7 1 000255 2004 8 1 000255 2004 9 1 000255 2004 10 1 000255 2004 11 1 000255 2004 12 1 000255 2005 1 2 000255 2005 2 2 000255 2005 3 2 000255 2005 4 2 000255 2005 5 2 000255 2005 6 1.5 000255 2005 7 1 000255 2005 8 1 000255 2005 9 1 000255 2005 10 1 000255 2005 11 1 000255 2005 12 1 000307 2015 1 1 000307 2015 2 1 000307 2015 3 1 000307 2015 4 2 000307 2015 5 3 000307 2015 6 3 000307 2015 7 3 000307 2015 8 3 000307 2015 9 4 000307 2015 10 5 000307 2015 11 5 000307 2015 12 5
@KS99 try this
data have;
input cusip $ year month analysts;
datalines;
000255 2004 1 .
000255 2004 2 .
000255 2004 3 1
000255 2004 4 .
000255 2004 5 .
000255 2004 6 .
000255 2004 7 .
000255 2004 8 .
000255 2004 9 .
000255 2004 10 .
000255 2004 11 .
000255 2004 12 .
000255 2005 1 .
000255 2005 2 .
000255 2005 3 .
000255 2005 4 2
000255 2005 5 2
000255 2005 6 .
000255 2005 7 1
000255 2005 8 .
000255 2005 9 .
000255 2005 10 .
000255 2005 11 .
000255 2005 12 .
000307 2015 1 .
000307 2015 2 1
000307 2015 3 1
000307 2015 4 2
000307 2015 5 3
000307 2015 6 .
000307 2015 7 3
000307 2015 8 3
000307 2015 9 .
000307 2015 10 5
000307 2015 11 .
000307 2015 12 .
;
data want(drop = a _a m);
_a = a;
do _N_ = 1 by 1 until (last.year | a);
set have(rename = analysts = a);
by cusip year;
end;
m = mean(_a, a);
do _N_ = 1 to _N_;
set have(rename = analysts = a);
analysts = coalesce(a, m);
output;
end;
run;
Result:
cusip year month analysts 000255 2004 1 1 000255 2004 2 1 000255 2004 3 1 000255 2004 4 1 000255 2004 5 1 000255 2004 6 1 000255 2004 7 1 000255 2004 8 1 000255 2004 9 1 000255 2004 10 1 000255 2004 11 1 000255 2004 12 1 000255 2005 1 2 000255 2005 2 2 000255 2005 3 2 000255 2005 4 2 000255 2005 5 2 000255 2005 6 1.5 000255 2005 7 1 000255 2005 8 1 000255 2005 9 1 000255 2005 10 1 000255 2005 11 1 000255 2005 12 1 000307 2015 1 1 000307 2015 2 1 000307 2015 3 1 000307 2015 4 2 000307 2015 5 3 000307 2015 6 3 000307 2015 7 3 000307 2015 8 3 000307 2015 9 4 000307 2015 10 5 000307 2015 11 5 000307 2015 12 5
data have;
infile cards expandtabs;
input cusip $ year month x;
datalines;
000255 2004 1 .
000255 2004 2 .
000255 2004 3 1
000255 2004 4 .
000255 2004 5 .
000255 2004 6 .
000255 2004 7 .
000255 2004 8 .
000255 2004 9 .
000255 2004 10 .
000255 2004 11 .
000255 2004 12 .
000255 2005 1 .
000255 2005 2 .
000255 2005 3 .
000255 2005 4 2
000255 2005 5 2
000255 2005 6 .
000255 2005 7 1
000255 2005 8 .
000255 2005 9 .
000255 2005 10 .
000255 2005 11 .
000255 2005 12 .
000307 2015 1 .
000307 2015 2 1
000307 2015 3 1
000307 2015 4 2
000307 2015 5 3
000307 2015 6 .
000307 2015 7 3
000307 2015 8 3
000307 2015 9 .
000307 2015 10 5
000307 2015 11 .
000307 2015 12 .
;
run;
data temp1;
set have;
by cusip year;
retain x1;
if first.year then call missing(x1);
if not missing(x) then x1=x;
run;
proc sort data=temp1 out=temp2;
by cusip year descending month;
run;
data temp3;
set temp2;
by cusip year;
retain x2;
if first.year then call missing(x2);
if not missing(x) then x2=x;
run;
data want;
set temp3;
want=mean(x1,x2);
drop x x1 x2;
run;
proc sort data=want;
by cusip year month;
run;
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.