BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
SeanZ
Obsidian | Level 7

I have a dataset like below. And I want to get help from you guys to change some values as per some rules.

permno     date                    gvkey

10000       03/31/2000          001

10000       06/30/2000          001

10000       09/30/2000          .

10000       03/31/2001          003

...

10001       03/31/2000          .

10001       06/30/2000          .

10001       09/30/2000          005

10001       12/31/2000          005

...

10005       03/31/2000          009

10005       06/30/2000          009

10005       09/30/2000          .

10005       12/31/2000          .

Ok. This is the original dataset looks like. I want to apply the following rule

First sort by permno, date and gvkey.

For the same permno, if there are missing gvkeys, always replace missing values with previous available values until no missing gvkeys.

If missing gvkeys are at the beginning, replace them with the non-missing values immediately after the last missing ones.

So the dataset I would like to see after adjustment should be

permno     date                    gvkey

10000       03/31/2000          001

10000       06/30/2000          001

10000       09/30/2000          001

10000       03/31/2001          003

...

10001       03/31/2000          005

10001       06/30/2000          005

10001       09/30/2000          005

10001       12/31/2000          005

...

10005       03/31/2000          009

10005       06/30/2000          009

10005       09/30/2000          009

10005       12/31/2000          009

Thank you for the help.

1 ACCEPTED SOLUTION

Accepted Solutions
haikuobian
Fluorite | Level 6

Here is another way using DOW, given your data has been sorted the way you want:

data have;

infile cards truncover;

input permno $    date :mmddyy10.                    gvkey $;

format date mmddyy10.;

cards;

10000       03/31/2000          001

10000       06/30/2000          001

10000       09/30/2000         

10000       03/31/2001          003

10001       03/31/2000         

10001       06/30/2000         

10001       09/30/2000          005

10001       12/31/2000          005

10005       03/31/2000          009

10005       06/30/2000          009

10005       09/30/2000         

10005       12/31/2000         

;

data want;

length _next _last $ 8;

do until (last.permno);

set have;

by permno;

        _next=coalescec(_next,gvkey);

END;

do until (last.permno);

set have;

by permno;

        gvkey=coalescec(gvkey,_last,_next);

        _last=coalescec(gvkey,_last);

output;

END;

drop _:;

run;

      

Haikuo

View solution in original post

10 REPLIES 10
Chrishi
Calcite | Level 5

Hope this solves your issue.

proc sort data=have;

by permno date gvkey;

run;

data new1;

retain gvkey1;

  set have;

by permno date gvkey;

  if gvkey ne . then gvkey1=gvkey;

   else if first.permno then gvkey1=gvkey;

run;

proc sort data=new1 out=new2;

by permno descending date gvkey;

run;

data new3;

retain gvkey2;

set new2;

by permno descending date gvkey;

if gvkey1 ne . then gvkey2=gvkey1;

else if first.permno then gvkey2=gvkey1;

drop gvkey1 gvkey;

rename gvkey2=gvkey;

run;

proc sort data=new3;

by permno date gvkey;

run;

Sudeer.

haikuobian
Fluorite | Level 6

Here is another way using DOW, given your data has been sorted the way you want:

data have;

infile cards truncover;

input permno $    date :mmddyy10.                    gvkey $;

format date mmddyy10.;

cards;

10000       03/31/2000          001

10000       06/30/2000          001

10000       09/30/2000         

10000       03/31/2001          003

10001       03/31/2000         

10001       06/30/2000         

10001       09/30/2000          005

10001       12/31/2000          005

10005       03/31/2000          009

10005       06/30/2000          009

10005       09/30/2000         

10005       12/31/2000         

;

data want;

length _next _last $ 8;

do until (last.permno);

set have;

by permno;

        _next=coalescec(_next,gvkey);

END;

do until (last.permno);

set have;

by permno;

        gvkey=coalescec(gvkey,_last,_next);

        _last=coalescec(gvkey,_last);

output;

END;

drop _:;

run;

      

Haikuo

Ksharp
Super User
data have;
input permno     date   : mmddyy10.                 gvkey ;
format date date9.;
cards;
10000       03/31/2000          .
10000       06/30/2000          001
10000       09/30/2000          .
10000       03/31/2001          003
10001       03/31/2000          .
10001       06/30/2000          .
10001       09/30/2000          005
10001       12/31/2000          .
10005       03/31/2000          .
10005       06/30/2000          009
10005       09/30/2000          .
10005       12/31/2000          .
;
run;
data want(drop=gvkey);
do until(not missing(gvkey) or last.permno)     ;
 set have;
 by permno;
 if first.permno then call missing(_gvkey);
end;
retain      _gvkey;
if not missing(gvkey) then _gvkey=gvkey;
do until(not missing(gvkey) or last.permno)     ;
 set have;
 by permno;
 output;
end;
run;

Xia Keshan

stat_sas
Ammonite | Level 13

proc stdize data=have out=want reponly method=median;

by permno;

var gvkey;

run;

Ksharp
Super User

What is this table ?

data have;

infile cards truncover;

input permno $    date :mmddyy10.                    gvkey ;

format date mmddyy10.;

cards;

10000       03/31/2000          .

10000       06/30/2000          23

10000       09/30/2000         45

10000       03/31/2001          53

10000       03/31/2001          33

10000       03/31/2001          543

10001       03/31/2000         .

10001       06/30/2000         .

10001       09/30/2000          .

10001       12/31/2000          .

10005       03/31/2000          009

10005       06/30/2000          009

10005       09/30/2000        

10005       12/31/2000        

;

stat_sas
Ammonite | Level 13

Hi Ksharp,

I could not understand your question. Please give more detail.

Regards,

Naeem

haikuobian
Fluorite | Level 6

I think what Ksharp means is that your solution is not robust enough to work on more general scenarios, one of such is what he put as the example. Your solution only works when 1. gvkey is numeric  AND 2. gvkeys are the same across the permno.

Haikuo

stat_sas
Ammonite | Level 13

Thanks Haikuo - Yes, this solution is for the mentioned problem and can not be generlized.

Ksharp
Super User

Sorry. Yesterday is too rush, I have no time to explain my purpose .  I mean your code can't work for that table if there are a couple of different value in the same group. If I understand right, your code is only set missing be its median ?

Xia Keshan

stat_sas
Ammonite | Level 13

Not a problem, yes code is imputing missing values with median within each group.

Regards,

Naeem

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 10 replies
  • 1090 views
  • 6 likes
  • 5 in conversation