BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
KS99
Obsidian | Level 7

Hi, SAS community ! 

 

I always appreciate your help-out. I am knocking on your doors to get another little help from you. 

Thank you in advance, after all. 

The dataset I have looks like this: 

data have; 
    input cusip  $  year  month  analysts;
    datalines;
000255	2004	1	.
000255	2004	2	.
000255	2004	3	1
000255	2004	4	.
000255	2004	5	.
000255	2004	6	.
000255	2004	7	.
000255	2004	8	.
000255	2004	9	.
000255	2004	10	.
000255	2004	11	.
000255	2004	12	.
000255	2005	1	.
000255	2005	2	.
000255	2005	3	.
000255	2005	4	2
000255	2005	5	2
000255	2005	6	.
000255	2005	7	1
000255	2005	8	.
000255	2005	9	.
000255	2005	10	.
000255	2005	11	.
000255	2005	12	.
000307	2015	1	.
000307	2015	2	1
000307	2015	3	1
000307	2015	4	2
000307	2015	5	3
000307	2015	6	.
000307	2015	7	3
000307	2015	8	3
000307	2015	9	.
000307	2015	10	5
000307	2015	11	.
000307	2015	12	.
;
run; 

This data contains the info of how many analysts cover a firm (cusip) in a certain year-month. 

One sure thing about this data is that each year has twelve months. I designed it so. 

 

What I want to have is, missing obs. are filled up by the average value of previous and next obs. 

Here, firm-year is a chunk. Thus, a firm-year that starts or ends missing should be filled by the first or last non-missing values. 

 

Therefore, I expect to achieve the following: 

data have;
    input cusip  $  year  month  analysts;
    datalines;
000255	2004	1	1
000255	2004	2	1
000255	2004	3	1
000255	2004	4	1
000255	2004	5	1
000255	2004	6	1
000255	2004	7	1
000255	2004	8	1
000255	2004	9	1
000255	2004	10	1
000255	2004	11	1
000255	2004	12	1
000255	2005	1	2
000255	2005	2	2
000255	2005	3	2
000255	2005	4	2
000255	2005	5	2
000255	2005	6	1.5
000255	2005	7	1
000255	2005	8	1
000255	2005	9	1
000255	2005	10	1
000255	2005	11	1
000255	2005	12	1
000307	2015	1	1
000307	2015	2	1
000307	2015	3	1
000307	2015	4	2
000307	2015	5	3
000307	2015	6	3
000307	2015	7	3
000307	2015	8	3
000307	2015	9	4
000307	2015	10	5
000307	2015	11	5
000307	2015	12	5
;
run;

I hope my question were not so stupid. 

Thank you all! 

 

Sincerely, 

 

KS -, 

 

1 ACCEPTED SOLUTION

Accepted Solutions
PeterClemmensen
Tourmaline | Level 20

@KS99 try this

 

data have; 
input cusip $ year month analysts;
datalines;
000255 2004 1  . 
000255 2004 2  . 
000255 2004 3  1 
000255 2004 4  . 
000255 2004 5  . 
000255 2004 6  . 
000255 2004 7  . 
000255 2004 8  . 
000255 2004 9  . 
000255 2004 10 . 
000255 2004 11 . 
000255 2004 12 . 
000255 2005 1  . 
000255 2005 2  . 
000255 2005 3  . 
000255 2005 4  2 
000255 2005 5  2 
000255 2005 6  . 
000255 2005 7  1 
000255 2005 8  . 
000255 2005 9  . 
000255 2005 10 . 
000255 2005 11 . 
000255 2005 12 . 
000307 2015 1  . 
000307 2015 2  1 
000307 2015 3  1 
000307 2015 4  2 
000307 2015 5  3 
000307 2015 6  . 
000307 2015 7  3 
000307 2015 8  3 
000307 2015 9  . 
000307 2015 10 5 
000307 2015 11 . 
000307 2015 12 . 
;

data want(drop = a _a m);

   _a = a;

   do _N_ = 1 by 1 until (last.year | a);
      set have(rename = analysts = a);
      by cusip year;
   end;

   m = mean(_a, a);

   do _N_ = 1 to _N_;
      set have(rename = analysts = a);
      analysts = coalesce(a, m);
      output;
   end;

run;

 

Result:

 

cusip   year  month  analysts
000255  2004  1      1
000255  2004  2      1
000255  2004  3      1
000255  2004  4      1
000255  2004  5      1
000255  2004  6      1
000255  2004  7      1
000255  2004  8      1
000255  2004  9      1
000255  2004  10     1
000255  2004  11     1
000255  2004  12     1
000255  2005  1      2
000255  2005  2      2
000255  2005  3      2
000255  2005  4      2
000255  2005  5      2
000255  2005  6      1.5
000255  2005  7      1
000255  2005  8      1
000255  2005  9      1
000255  2005  10     1
000255  2005  11     1
000255  2005  12     1
000307  2015  1      1
000307  2015  2      1
000307  2015  3      1
000307  2015  4      2
000307  2015  5      3
000307  2015  6      3
000307  2015  7      3
000307  2015  8      3
000307  2015  9      4
000307  2015  10     5
000307  2015  11     5
000307  2015  12     5

 

View solution in original post

2 REPLIES 2
PeterClemmensen
Tourmaline | Level 20

@KS99 try this

 

data have; 
input cusip $ year month analysts;
datalines;
000255 2004 1  . 
000255 2004 2  . 
000255 2004 3  1 
000255 2004 4  . 
000255 2004 5  . 
000255 2004 6  . 
000255 2004 7  . 
000255 2004 8  . 
000255 2004 9  . 
000255 2004 10 . 
000255 2004 11 . 
000255 2004 12 . 
000255 2005 1  . 
000255 2005 2  . 
000255 2005 3  . 
000255 2005 4  2 
000255 2005 5  2 
000255 2005 6  . 
000255 2005 7  1 
000255 2005 8  . 
000255 2005 9  . 
000255 2005 10 . 
000255 2005 11 . 
000255 2005 12 . 
000307 2015 1  . 
000307 2015 2  1 
000307 2015 3  1 
000307 2015 4  2 
000307 2015 5  3 
000307 2015 6  . 
000307 2015 7  3 
000307 2015 8  3 
000307 2015 9  . 
000307 2015 10 5 
000307 2015 11 . 
000307 2015 12 . 
;

data want(drop = a _a m);

   _a = a;

   do _N_ = 1 by 1 until (last.year | a);
      set have(rename = analysts = a);
      by cusip year;
   end;

   m = mean(_a, a);

   do _N_ = 1 to _N_;
      set have(rename = analysts = a);
      analysts = coalesce(a, m);
      output;
   end;

run;

 

Result:

 

cusip   year  month  analysts
000255  2004  1      1
000255  2004  2      1
000255  2004  3      1
000255  2004  4      1
000255  2004  5      1
000255  2004  6      1
000255  2004  7      1
000255  2004  8      1
000255  2004  9      1
000255  2004  10     1
000255  2004  11     1
000255  2004  12     1
000255  2005  1      2
000255  2005  2      2
000255  2005  3      2
000255  2005  4      2
000255  2005  5      2
000255  2005  6      1.5
000255  2005  7      1
000255  2005  8      1
000255  2005  9      1
000255  2005  10     1
000255  2005  11     1
000255  2005  12     1
000307  2015  1      1
000307  2015  2      1
000307  2015  3      1
000307  2015  4      2
000307  2015  5      3
000307  2015  6      3
000307  2015  7      3
000307  2015  8      3
000307  2015  9      4
000307  2015  10     5
000307  2015  11     5
000307  2015  12     5

 

Ksharp
Super User
data have; 
infile cards expandtabs;
    input cusip  $  year  month  x;
    datalines;
000255 2004 1 .
000255 2004 2 .
000255 2004 3 1
000255 2004 4 .
000255 2004 5 .
000255 2004 6 .
000255 2004 7 .
000255 2004 8 .
000255 2004 9 .
000255 2004 10 .
000255 2004 11 .
000255 2004 12 .
000255 2005 1 .
000255 2005 2 .
000255 2005 3 .
000255 2005 4 2
000255 2005 5 2
000255 2005 6 .
000255 2005 7 1
000255 2005 8 .
000255 2005 9 .
000255 2005 10 .
000255 2005 11 .
000255 2005 12 .
000307 2015 1 .
000307 2015 2 1
000307 2015 3 1
000307 2015 4 2
000307 2015 5 3
000307 2015 6 .
000307 2015 7 3
000307 2015 8 3
000307 2015 9 .
000307 2015 10 5
000307 2015 11 .
000307 2015 12 .
;
run; 

data temp1;
 set have;
 by cusip year;
 retain x1;
 if first.year then call missing(x1);
 if not missing(x) then x1=x;
run;
proc sort data=temp1 out=temp2;
by cusip year descending month;
run;
data temp3;
 set temp2;
 by cusip year;
 retain x2;
 if first.year then call missing(x2);
 if not missing(x) then x2=x;
run;
data want;
 set temp3;
 want=mean(x1,x2);
 drop x x1 x2;
run;
proc sort data=want;
by cusip year  month;
run;

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 2 replies
  • 356 views
  • 1 like
  • 3 in conversation