I have a dataset like below. And I want to get help from you guys to change some values as per some rules.
permno date gvkey
10000 03/31/2000 001
10000 06/30/2000 001
10000 09/30/2000 .
10000 03/31/2001 003
...
10001 03/31/2000 .
10001 06/30/2000 .
10001 09/30/2000 005
10001 12/31/2000 005
...
10005 03/31/2000 009
10005 06/30/2000 009
10005 09/30/2000 .
10005 12/31/2000 .
Ok. This is the original dataset looks like. I want to apply the following rule
First sort by permno, date and gvkey.
For the same permno, if there are missing gvkeys, always replace missing values with previous available values until no missing gvkeys.
If missing gvkeys are at the beginning, replace them with the non-missing values immediately after the last missing ones.
So the dataset I would like to see after adjustment should be
permno date gvkey
10000 03/31/2000 001
10000 06/30/2000 001
10000 09/30/2000 001
10000 03/31/2001 003
...
10001 03/31/2000 005
10001 06/30/2000 005
10001 09/30/2000 005
10001 12/31/2000 005
...
10005 03/31/2000 009
10005 06/30/2000 009
10005 09/30/2000 009
10005 12/31/2000 009
Thank you for the help.
Here is another way using DOW, given your data has been sorted the way you want:
data have;
infile cards truncover;
input permno $ date :mmddyy10. gvkey $;
format date mmddyy10.;
cards;
10000 03/31/2000 001
10000 06/30/2000 001
10000 09/30/2000
10000 03/31/2001 003
10001 03/31/2000
10001 06/30/2000
10001 09/30/2000 005
10001 12/31/2000 005
10005 03/31/2000 009
10005 06/30/2000 009
10005 09/30/2000
10005 12/31/2000
;
data want;
length _next _last $ 8;
do until (last.permno);
set have;
by permno;
_next=coalescec(_next,gvkey);
END;
do until (last.permno);
set have;
by permno;
gvkey=coalescec(gvkey,_last,_next);
_last=coalescec(gvkey,_last);
output;
END;
drop _:;
run;
Haikuo
Hope this solves your issue.
proc sort data=have;
by permno date gvkey;
run;
data new1;
retain gvkey1;
set have;
by permno date gvkey;
if gvkey ne . then gvkey1=gvkey;
else if first.permno then gvkey1=gvkey;
run;
proc sort data=new1 out=new2;
by permno descending date gvkey;
run;
data new3;
retain gvkey2;
set new2;
by permno descending date gvkey;
if gvkey1 ne . then gvkey2=gvkey1;
else if first.permno then gvkey2=gvkey1;
drop gvkey1 gvkey;
rename gvkey2=gvkey;
run;
proc sort data=new3;
by permno date gvkey;
run;
Sudeer.
Here is another way using DOW, given your data has been sorted the way you want:
data have;
infile cards truncover;
input permno $ date :mmddyy10. gvkey $;
format date mmddyy10.;
cards;
10000 03/31/2000 001
10000 06/30/2000 001
10000 09/30/2000
10000 03/31/2001 003
10001 03/31/2000
10001 06/30/2000
10001 09/30/2000 005
10001 12/31/2000 005
10005 03/31/2000 009
10005 06/30/2000 009
10005 09/30/2000
10005 12/31/2000
;
data want;
length _next _last $ 8;
do until (last.permno);
set have;
by permno;
_next=coalescec(_next,gvkey);
END;
do until (last.permno);
set have;
by permno;
gvkey=coalescec(gvkey,_last,_next);
_last=coalescec(gvkey,_last);
output;
END;
drop _:;
run;
Haikuo
data have; input permno date : mmddyy10. gvkey ; format date date9.; cards; 10000 03/31/2000 . 10000 06/30/2000 001 10000 09/30/2000 . 10000 03/31/2001 003 10001 03/31/2000 . 10001 06/30/2000 . 10001 09/30/2000 005 10001 12/31/2000 . 10005 03/31/2000 . 10005 06/30/2000 009 10005 09/30/2000 . 10005 12/31/2000 . ; run; data want(drop=gvkey); do until(not missing(gvkey) or last.permno) ; set have; by permno; if first.permno then call missing(_gvkey); end; retain _gvkey; if not missing(gvkey) then _gvkey=gvkey; do until(not missing(gvkey) or last.permno) ; set have; by permno; output; end; run;
Xia Keshan
proc stdize data=have out=want reponly method=median;
by permno;
var gvkey;
run;
What is this table ?
data have;
infile cards truncover;
input permno $ date :mmddyy10. gvkey ;
format date mmddyy10.;
cards;
10000 03/31/2000 .
10000 06/30/2000 23
10000 09/30/2000 45
10000 03/31/2001 53
10000 03/31/2001 33
10000 03/31/2001 543
10001 03/31/2000 .
10001 06/30/2000 .
10001 09/30/2000 .
10001 12/31/2000 .
10005 03/31/2000 009
10005 06/30/2000 009
10005 09/30/2000
10005 12/31/2000
;
Hi Ksharp,
I could not understand your question. Please give more detail.
Regards,
Naeem
I think what Ksharp means is that your solution is not robust enough to work on more general scenarios, one of such is what he put as the example. Your solution only works when 1. gvkey is numeric AND 2. gvkeys are the same across the permno.
Haikuo
Thanks Haikuo - Yes, this solution is for the mentioned problem and can not be generlized.
Sorry. Yesterday is too rush, I have no time to explain my purpose . I mean your code can't work for that table if there are a couple of different value in the same group. If I understand right, your code is only set missing be its median ?
Xia Keshan
Not a problem, yes code is imputing missing values with median within each group.
Regards,
Naeem
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.