BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
KS99
Obsidian | Level 7

Hi, SAS community ! 

 

I always appreciate your help-out. I am knocking on your doors to get another little help from you. 

Thank you in advance, after all. 

The dataset I have looks like this: 

data have; 
    input cusip  $  year  month  analysts;
    datalines;
000255	2004	1	.
000255	2004	2	.
000255	2004	3	1
000255	2004	4	.
000255	2004	5	.
000255	2004	6	.
000255	2004	7	.
000255	2004	8	.
000255	2004	9	.
000255	2004	10	.
000255	2004	11	.
000255	2004	12	.
000255	2005	1	.
000255	2005	2	.
000255	2005	3	.
000255	2005	4	2
000255	2005	5	2
000255	2005	6	.
000255	2005	7	1
000255	2005	8	.
000255	2005	9	.
000255	2005	10	.
000255	2005	11	.
000255	2005	12	.
000307	2015	1	.
000307	2015	2	1
000307	2015	3	1
000307	2015	4	2
000307	2015	5	3
000307	2015	6	.
000307	2015	7	3
000307	2015	8	3
000307	2015	9	.
000307	2015	10	5
000307	2015	11	.
000307	2015	12	.
;
run; 

This data contains the info of how many analysts cover a firm (cusip) in a certain year-month. 

One sure thing about this data is that each year has twelve months. I designed it so. 

 

What I want to have is, missing obs. are filled up by the average value of previous and next obs. 

Here, firm-year is a chunk. Thus, a firm-year that starts or ends missing should be filled by the first or last non-missing values. 

 

Therefore, I expect to achieve the following: 

data have;
    input cusip  $  year  month  analysts;
    datalines;
000255	2004	1	1
000255	2004	2	1
000255	2004	3	1
000255	2004	4	1
000255	2004	5	1
000255	2004	6	1
000255	2004	7	1
000255	2004	8	1
000255	2004	9	1
000255	2004	10	1
000255	2004	11	1
000255	2004	12	1
000255	2005	1	2
000255	2005	2	2
000255	2005	3	2
000255	2005	4	2
000255	2005	5	2
000255	2005	6	1.5
000255	2005	7	1
000255	2005	8	1
000255	2005	9	1
000255	2005	10	1
000255	2005	11	1
000255	2005	12	1
000307	2015	1	1
000307	2015	2	1
000307	2015	3	1
000307	2015	4	2
000307	2015	5	3
000307	2015	6	3
000307	2015	7	3
000307	2015	8	3
000307	2015	9	4
000307	2015	10	5
000307	2015	11	5
000307	2015	12	5
;
run;

I hope my question were not so stupid. 

Thank you all! 

 

Sincerely, 

 

KS -, 

 

1 ACCEPTED SOLUTION

Accepted Solutions
PeterClemmensen
Tourmaline | Level 20

@KS99 try this

 

data have; 
input cusip $ year month analysts;
datalines;
000255 2004 1  . 
000255 2004 2  . 
000255 2004 3  1 
000255 2004 4  . 
000255 2004 5  . 
000255 2004 6  . 
000255 2004 7  . 
000255 2004 8  . 
000255 2004 9  . 
000255 2004 10 . 
000255 2004 11 . 
000255 2004 12 . 
000255 2005 1  . 
000255 2005 2  . 
000255 2005 3  . 
000255 2005 4  2 
000255 2005 5  2 
000255 2005 6  . 
000255 2005 7  1 
000255 2005 8  . 
000255 2005 9  . 
000255 2005 10 . 
000255 2005 11 . 
000255 2005 12 . 
000307 2015 1  . 
000307 2015 2  1 
000307 2015 3  1 
000307 2015 4  2 
000307 2015 5  3 
000307 2015 6  . 
000307 2015 7  3 
000307 2015 8  3 
000307 2015 9  . 
000307 2015 10 5 
000307 2015 11 . 
000307 2015 12 . 
;

data want(drop = a _a m);

   _a = a;

   do _N_ = 1 by 1 until (last.year | a);
      set have(rename = analysts = a);
      by cusip year;
   end;

   m = mean(_a, a);

   do _N_ = 1 to _N_;
      set have(rename = analysts = a);
      analysts = coalesce(a, m);
      output;
   end;

run;

 

Result:

 

cusip   year  month  analysts
000255  2004  1      1
000255  2004  2      1
000255  2004  3      1
000255  2004  4      1
000255  2004  5      1
000255  2004  6      1
000255  2004  7      1
000255  2004  8      1
000255  2004  9      1
000255  2004  10     1
000255  2004  11     1
000255  2004  12     1
000255  2005  1      2
000255  2005  2      2
000255  2005  3      2
000255  2005  4      2
000255  2005  5      2
000255  2005  6      1.5
000255  2005  7      1
000255  2005  8      1
000255  2005  9      1
000255  2005  10     1
000255  2005  11     1
000255  2005  12     1
000307  2015  1      1
000307  2015  2      1
000307  2015  3      1
000307  2015  4      2
000307  2015  5      3
000307  2015  6      3
000307  2015  7      3
000307  2015  8      3
000307  2015  9      4
000307  2015  10     5
000307  2015  11     5
000307  2015  12     5

 

View solution in original post

2 REPLIES 2
PeterClemmensen
Tourmaline | Level 20

@KS99 try this

 

data have; 
input cusip $ year month analysts;
datalines;
000255 2004 1  . 
000255 2004 2  . 
000255 2004 3  1 
000255 2004 4  . 
000255 2004 5  . 
000255 2004 6  . 
000255 2004 7  . 
000255 2004 8  . 
000255 2004 9  . 
000255 2004 10 . 
000255 2004 11 . 
000255 2004 12 . 
000255 2005 1  . 
000255 2005 2  . 
000255 2005 3  . 
000255 2005 4  2 
000255 2005 5  2 
000255 2005 6  . 
000255 2005 7  1 
000255 2005 8  . 
000255 2005 9  . 
000255 2005 10 . 
000255 2005 11 . 
000255 2005 12 . 
000307 2015 1  . 
000307 2015 2  1 
000307 2015 3  1 
000307 2015 4  2 
000307 2015 5  3 
000307 2015 6  . 
000307 2015 7  3 
000307 2015 8  3 
000307 2015 9  . 
000307 2015 10 5 
000307 2015 11 . 
000307 2015 12 . 
;

data want(drop = a _a m);

   _a = a;

   do _N_ = 1 by 1 until (last.year | a);
      set have(rename = analysts = a);
      by cusip year;
   end;

   m = mean(_a, a);

   do _N_ = 1 to _N_;
      set have(rename = analysts = a);
      analysts = coalesce(a, m);
      output;
   end;

run;

 

Result:

 

cusip   year  month  analysts
000255  2004  1      1
000255  2004  2      1
000255  2004  3      1
000255  2004  4      1
000255  2004  5      1
000255  2004  6      1
000255  2004  7      1
000255  2004  8      1
000255  2004  9      1
000255  2004  10     1
000255  2004  11     1
000255  2004  12     1
000255  2005  1      2
000255  2005  2      2
000255  2005  3      2
000255  2005  4      2
000255  2005  5      2
000255  2005  6      1.5
000255  2005  7      1
000255  2005  8      1
000255  2005  9      1
000255  2005  10     1
000255  2005  11     1
000255  2005  12     1
000307  2015  1      1
000307  2015  2      1
000307  2015  3      1
000307  2015  4      2
000307  2015  5      3
000307  2015  6      3
000307  2015  7      3
000307  2015  8      3
000307  2015  9      4
000307  2015  10     5
000307  2015  11     5
000307  2015  12     5

 

Ksharp
Super User
data have; 
infile cards expandtabs;
    input cusip  $  year  month  x;
    datalines;
000255 2004 1 .
000255 2004 2 .
000255 2004 3 1
000255 2004 4 .
000255 2004 5 .
000255 2004 6 .
000255 2004 7 .
000255 2004 8 .
000255 2004 9 .
000255 2004 10 .
000255 2004 11 .
000255 2004 12 .
000255 2005 1 .
000255 2005 2 .
000255 2005 3 .
000255 2005 4 2
000255 2005 5 2
000255 2005 6 .
000255 2005 7 1
000255 2005 8 .
000255 2005 9 .
000255 2005 10 .
000255 2005 11 .
000255 2005 12 .
000307 2015 1 .
000307 2015 2 1
000307 2015 3 1
000307 2015 4 2
000307 2015 5 3
000307 2015 6 .
000307 2015 7 3
000307 2015 8 3
000307 2015 9 .
000307 2015 10 5
000307 2015 11 .
000307 2015 12 .
;
run; 

data temp1;
 set have;
 by cusip year;
 retain x1;
 if first.year then call missing(x1);
 if not missing(x) then x1=x;
run;
proc sort data=temp1 out=temp2;
by cusip year descending month;
run;
data temp3;
 set temp2;
 by cusip year;
 retain x2;
 if first.year then call missing(x2);
 if not missing(x) then x2=x;
run;
data want;
 set temp3;
 want=mean(x1,x2);
 drop x x1 x2;
run;
proc sort data=want;
by cusip year  month;
run;

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 2 replies
  • 829 views
  • 1 like
  • 3 in conversation