Hi, SAS community !
I always appreciate your help-out. I am knocking on your doors to get another little help from you.
Thank you in advance, after all.
The dataset I have looks like this:
data have;
input cusip $ year month analysts;
datalines;
000255 2004 1 .
000255 2004 2 .
000255 2004 3 1
000255 2004 4 .
000255 2004 5 .
000255 2004 6 .
000255 2004 7 .
000255 2004 8 .
000255 2004 9 .
000255 2004 10 .
000255 2004 11 .
000255 2004 12 .
000255 2005 1 .
000255 2005 2 .
000255 2005 3 .
000255 2005 4 2
000255 2005 5 2
000255 2005 6 .
000255 2005 7 1
000255 2005 8 .
000255 2005 9 .
000255 2005 10 .
000255 2005 11 .
000255 2005 12 .
000307 2015 1 .
000307 2015 2 1
000307 2015 3 1
000307 2015 4 2
000307 2015 5 3
000307 2015 6 .
000307 2015 7 3
000307 2015 8 3
000307 2015 9 .
000307 2015 10 5
000307 2015 11 .
000307 2015 12 .
;
run;
This data contains the info of how many analysts cover a firm (cusip) in a certain year-month.
One sure thing about this data is that each year has twelve months. I designed it so.
What I want to have is, missing obs. are filled up by the average value of previous and next obs.
Here, firm-year is a chunk. Thus, a firm-year that starts or ends missing should be filled by the first or last non-missing values.
Therefore, I expect to achieve the following:
data have;
input cusip $ year month analysts;
datalines;
000255 2004 1 1
000255 2004 2 1
000255 2004 3 1
000255 2004 4 1
000255 2004 5 1
000255 2004 6 1
000255 2004 7 1
000255 2004 8 1
000255 2004 9 1
000255 2004 10 1
000255 2004 11 1
000255 2004 12 1
000255 2005 1 2
000255 2005 2 2
000255 2005 3 2
000255 2005 4 2
000255 2005 5 2
000255 2005 6 1.5
000255 2005 7 1
000255 2005 8 1
000255 2005 9 1
000255 2005 10 1
000255 2005 11 1
000255 2005 12 1
000307 2015 1 1
000307 2015 2 1
000307 2015 3 1
000307 2015 4 2
000307 2015 5 3
000307 2015 6 3
000307 2015 7 3
000307 2015 8 3
000307 2015 9 4
000307 2015 10 5
000307 2015 11 5
000307 2015 12 5
;
run;
I hope my question were not so stupid.
Thank you all!
Sincerely,
KS -,
@KS99 try this
data have;
input cusip $ year month analysts;
datalines;
000255 2004 1 .
000255 2004 2 .
000255 2004 3 1
000255 2004 4 .
000255 2004 5 .
000255 2004 6 .
000255 2004 7 .
000255 2004 8 .
000255 2004 9 .
000255 2004 10 .
000255 2004 11 .
000255 2004 12 .
000255 2005 1 .
000255 2005 2 .
000255 2005 3 .
000255 2005 4 2
000255 2005 5 2
000255 2005 6 .
000255 2005 7 1
000255 2005 8 .
000255 2005 9 .
000255 2005 10 .
000255 2005 11 .
000255 2005 12 .
000307 2015 1 .
000307 2015 2 1
000307 2015 3 1
000307 2015 4 2
000307 2015 5 3
000307 2015 6 .
000307 2015 7 3
000307 2015 8 3
000307 2015 9 .
000307 2015 10 5
000307 2015 11 .
000307 2015 12 .
;
data want(drop = a _a m);
_a = a;
do _N_ = 1 by 1 until (last.year | a);
set have(rename = analysts = a);
by cusip year;
end;
m = mean(_a, a);
do _N_ = 1 to _N_;
set have(rename = analysts = a);
analysts = coalesce(a, m);
output;
end;
run;
Result:
cusip year month analysts 000255 2004 1 1 000255 2004 2 1 000255 2004 3 1 000255 2004 4 1 000255 2004 5 1 000255 2004 6 1 000255 2004 7 1 000255 2004 8 1 000255 2004 9 1 000255 2004 10 1 000255 2004 11 1 000255 2004 12 1 000255 2005 1 2 000255 2005 2 2 000255 2005 3 2 000255 2005 4 2 000255 2005 5 2 000255 2005 6 1.5 000255 2005 7 1 000255 2005 8 1 000255 2005 9 1 000255 2005 10 1 000255 2005 11 1 000255 2005 12 1 000307 2015 1 1 000307 2015 2 1 000307 2015 3 1 000307 2015 4 2 000307 2015 5 3 000307 2015 6 3 000307 2015 7 3 000307 2015 8 3 000307 2015 9 4 000307 2015 10 5 000307 2015 11 5 000307 2015 12 5
@KS99 try this
data have;
input cusip $ year month analysts;
datalines;
000255 2004 1 .
000255 2004 2 .
000255 2004 3 1
000255 2004 4 .
000255 2004 5 .
000255 2004 6 .
000255 2004 7 .
000255 2004 8 .
000255 2004 9 .
000255 2004 10 .
000255 2004 11 .
000255 2004 12 .
000255 2005 1 .
000255 2005 2 .
000255 2005 3 .
000255 2005 4 2
000255 2005 5 2
000255 2005 6 .
000255 2005 7 1
000255 2005 8 .
000255 2005 9 .
000255 2005 10 .
000255 2005 11 .
000255 2005 12 .
000307 2015 1 .
000307 2015 2 1
000307 2015 3 1
000307 2015 4 2
000307 2015 5 3
000307 2015 6 .
000307 2015 7 3
000307 2015 8 3
000307 2015 9 .
000307 2015 10 5
000307 2015 11 .
000307 2015 12 .
;
data want(drop = a _a m);
_a = a;
do _N_ = 1 by 1 until (last.year | a);
set have(rename = analysts = a);
by cusip year;
end;
m = mean(_a, a);
do _N_ = 1 to _N_;
set have(rename = analysts = a);
analysts = coalesce(a, m);
output;
end;
run;
Result:
cusip year month analysts 000255 2004 1 1 000255 2004 2 1 000255 2004 3 1 000255 2004 4 1 000255 2004 5 1 000255 2004 6 1 000255 2004 7 1 000255 2004 8 1 000255 2004 9 1 000255 2004 10 1 000255 2004 11 1 000255 2004 12 1 000255 2005 1 2 000255 2005 2 2 000255 2005 3 2 000255 2005 4 2 000255 2005 5 2 000255 2005 6 1.5 000255 2005 7 1 000255 2005 8 1 000255 2005 9 1 000255 2005 10 1 000255 2005 11 1 000255 2005 12 1 000307 2015 1 1 000307 2015 2 1 000307 2015 3 1 000307 2015 4 2 000307 2015 5 3 000307 2015 6 3 000307 2015 7 3 000307 2015 8 3 000307 2015 9 4 000307 2015 10 5 000307 2015 11 5 000307 2015 12 5
data have;
infile cards expandtabs;
input cusip $ year month x;
datalines;
000255 2004 1 .
000255 2004 2 .
000255 2004 3 1
000255 2004 4 .
000255 2004 5 .
000255 2004 6 .
000255 2004 7 .
000255 2004 8 .
000255 2004 9 .
000255 2004 10 .
000255 2004 11 .
000255 2004 12 .
000255 2005 1 .
000255 2005 2 .
000255 2005 3 .
000255 2005 4 2
000255 2005 5 2
000255 2005 6 .
000255 2005 7 1
000255 2005 8 .
000255 2005 9 .
000255 2005 10 .
000255 2005 11 .
000255 2005 12 .
000307 2015 1 .
000307 2015 2 1
000307 2015 3 1
000307 2015 4 2
000307 2015 5 3
000307 2015 6 .
000307 2015 7 3
000307 2015 8 3
000307 2015 9 .
000307 2015 10 5
000307 2015 11 .
000307 2015 12 .
;
run;
data temp1;
set have;
by cusip year;
retain x1;
if first.year then call missing(x1);
if not missing(x) then x1=x;
run;
proc sort data=temp1 out=temp2;
by cusip year descending month;
run;
data temp3;
set temp2;
by cusip year;
retain x2;
if first.year then call missing(x2);
if not missing(x) then x2=x;
run;
data want;
set temp3;
want=mean(x1,x2);
drop x x1 x2;
run;
proc sort data=want;
by cusip year month;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.