Solved: Replacing missing credit ratings with previous observation within each...

schneiderd · Posted 06-18-2019 08:33 PM

Below is a sample data and SAS program (I use SAS 9.4) that illustrates a challenge I have with processing the replacement of missing credit ratings within each gvkey (identifier) for a range of ym (months within a year). In other words, the procedure to replace missing ratings needs to reset with each individual gvkey.

gvkey = a unique number for each corporation

ym = a unique month and year

rating = credit rating issued for a gvkey at a specific ym

“.” = missing rating for specific ym within a gvkey

Two gvkeys, 001078 and 002504, are used in the data below, though there are over one thousand in the actual study. The ym’s repeat for each gvkey as they represent the months for one year. The coding follows the data and the SAS results follow the coding.

I am trying to replace missing ratings with the last or most recent rating. The program works for the most part as shown in column “rating1.” It worked perfectly for gvkey 001078. However, I need the program to process the replacement within each gvkey and not have the next gvkey read from the previous gvkey.

The first gvkey, 1078, is shown in observations 1 -12. Observation 13 is the start of a new gvkey, 2504.

The problem is that gvkey 2504 is reading the last rating from observation 11 that is from the previous and different gvkey, 1078. The replacement process should start over again for the next gvkey, 2504, and should start by reading observation 13 as the first rating or data line, which is a missing rating, and should report a missing rating in the “rating1” column until observation 15 when a new rating for gvkey 2504 appears.

If the first two or more observations within a gvkey is a missing rating then the replacement rating should also be a missing rating until a new rating appears. That is the correct treatment of the data.

How do I have the replacement process begin and end within each gvkey? Feel free to edit the code below if you have any suggestions.

Thank you,

Doug

data a;

input gvkey ym rating$;

cards;

001078 24216 .

001078 24217 .

001078 24218 .

001078 24219 BBB

001078 24220 BBB

001078 24221 BBB+

001078 24222 .

001078 24223 .

001078 24224 .

001078 24225 .

001078 24226 B

001078 24227 .

002504 24216 .

002504 24217 .

002504 24218 A

002504 24219 A-

002504 24220 .

002504 24221 .

002504 24222 CC

002504 24223 CC

002504 24224 CC

002504 24225 CC+

002504 24226 .

002504 24227 .

proc sort;

by gvkey ym;

run;

data b;

set a;

retain rating1;

if rating > "." then rating1=rating;

by gvkey; /* this step did not seem to help */

run;

proc print;

var gvkey ym rating rating1;

run;

RESULTS:

The SAS System

Obs	gvkey	ym	rating	rating1
1	1078	24216			← The first three observations in the
2	1078	24217			← rating1 column worked as intended as
3	1078	24218			← they should be missing ratings.
4	1078	24219	BBB	BBB
5	1078	24220	BBB	BBB
6	1078	24221	BBB+	BBB+
7	1078	24222		BBB+
8	1078	24223		BBB+
9	1078	24224		BBB+
10	1078	24225		BBB+
11	1078	24226	B	B
12	1078	24227		B
13	2504	24216		B	← Observations 13 and 14 in the rating1
14	2504	24217		B	← column should still be missing ratings.
15	2504	24218	A	A
16	2504	24219	A-	A-
17	2504	24220		A-
18	2504	24221		A-
19	2504	24222	CC	CC
20	2504	24223	CC	CC
21	2504	24224	CC	CC
22	2504	24225	CC+	CC+
23	2504	24226		CC+
24	2504	24227		CC+

Reeza · Posted 06-18-2019 09:19 PM

You're fairly close, but not quite there.

You can use NOT MISSING() to check for missing values, not the >"."

You set up a BY group but never used it, you need to use it to reset it at the first record of each GVKEY so the values are not retained across the GVKEY groups.

data b;
set a;
by gvkey;

retain rating1;

if not missing(rating) then rating1=rating;

*reset for the first gvkey that may be missing;
if first.gvkey and missing(rating) then call missing(rating1);



run;

@schneiderd wrote:

Below is a sample data and SAS program (I use SAS 9.4) that illustrates a challenge I have with processing the replacement of missing credit ratings within each gvkey (identifier) for a range of ym (months within a year). In other words, the procedure to replace missing ratings needs to reset with each individual gvkey.

gvkey = a unique number for each corporation

ym = a unique month and year

rating = credit rating issued for a gvkey at a specific ym

“.” = missing rating for specific ym within a gvkey

Two gvkeys, 001078 and 002504, are used in the data below, though there are over one thousand in the actual study. The ym’s repeat for each gvkey as they represent the months for one year. The coding follows the data and the SAS results follow the coding.

I am trying to replace missing ratings with the last or most recent rating. The program works for the most part as shown in column “rating1.” It worked perfectly for gvkey 001078. However, I need the program to process the replacement within each gvkey and not have the next gvkey read from the previous gvkey.

The first gvkey, 1078, is shown in observations 1 -12. Observation 13 is the start of a new gvkey, 2504.

The problem is that gvkey 2504 is reading the last rating from observation 11 that is from the previous and different gvkey, 1078. The replacement process should start over again for the next gvkey, 2504, and should start by reading observation 13 as the first rating or data line, which is a missing rating, and should report a missing rating in the “rating1” column until observation 15 when a new rating for gvkey 2504 appears.

If the first two or more observations within a gvkey is a missing rating then the replacement rating should also be a missing rating until a new rating appears. That is the correct treatment of the data.

How do I have the replacement process begin and end within each gvkey? Feel free to edit the code below if you have any suggestions.

Thank you,

Doug

data a;

input gvkey ym rating$;

cards;

001078    24216     .

001078    24217     .

001078    24218     .

001078    24219     BBB

001078    24220     BBB

001078    24221     BBB+

001078    24222     .

001078    24223     .

001078    24224     .

001078    24225     .

001078    24226     B

001078    24227     .

002504    24216     .

002504    24217     .

002504    24218     A

002504    24219     A-

002504    24220     .

002504    24221     .

002504    24222     CC

002504    24223     CC

002504    24224     CC

002504    24225     CC+

002504    24226     .

002504    24227     .

proc sort;

by gvkey ym;

run;

data b;

set a;

retain rating1;

if rating > "." then rating1=rating;

by gvkey; /* this step did not seem to help */

run;

proc print;

var gvkey ym rating rating1;

run;

RESULTS:

The SAS System

Obs

gvkey

ym

rating

rating1

1

1078

24216

← The first three observations in the

2

1078

24217

← rating1 column worked as intended as

3

1078

24218

← they should be missing ratings.

4

1078

24219

BBB

BBB

5

1078

24220

BBB

BBB

6

1078

24221

BBB+

BBB+

7

1078

24222

BBB+

8

1078

24223

BBB+

9

1078

24224

BBB+

10

1078

24225

BBB+

11

1078

24226

B

B

12

1078

24227

B

13

2504

24216

B

← Observations 13 and 14 in the rating1

14

2504

24217

B

← column should still be missing ratings.

15

2504

24218

A

A

16

2504

24219

A-

A-

17

2504

24220

A-

18

2504

24221

A-

19

2504

24222

CC

CC

20

2504

24223

CC

CC

21

2504

24224

CC

CC

22

2504

24225

CC+

CC+

23

2504

24226

CC+

24

2504

24227

CC+

View solution in original post

Reeza · Posted 06-18-2019 09:19 PM

You're fairly close, but not quite there.

You can use NOT MISSING() to check for missing values, not the >"."

You set up a BY group but never used it, you need to use it to reset it at the first record of each GVKEY so the values are not retained across the GVKEY groups.

data b;
set a;
by gvkey;

retain rating1;

if not missing(rating) then rating1=rating;

*reset for the first gvkey that may be missing;
if first.gvkey and missing(rating) then call missing(rating1);



run;

@schneiderd wrote:

Below is a sample data and SAS program (I use SAS 9.4) that illustrates a challenge I have with processing the replacement of missing credit ratings within each gvkey (identifier) for a range of ym (months within a year). In other words, the procedure to replace missing ratings needs to reset with each individual gvkey.

gvkey = a unique number for each corporation

ym = a unique month and year

rating = credit rating issued for a gvkey at a specific ym

“.” = missing rating for specific ym within a gvkey

Two gvkeys, 001078 and 002504, are used in the data below, though there are over one thousand in the actual study. The ym’s repeat for each gvkey as they represent the months for one year. The coding follows the data and the SAS results follow the coding.

I am trying to replace missing ratings with the last or most recent rating. The program works for the most part as shown in column “rating1.” It worked perfectly for gvkey 001078. However, I need the program to process the replacement within each gvkey and not have the next gvkey read from the previous gvkey.

The first gvkey, 1078, is shown in observations 1 -12. Observation 13 is the start of a new gvkey, 2504.

The problem is that gvkey 2504 is reading the last rating from observation 11 that is from the previous and different gvkey, 1078. The replacement process should start over again for the next gvkey, 2504, and should start by reading observation 13 as the first rating or data line, which is a missing rating, and should report a missing rating in the “rating1” column until observation 15 when a new rating for gvkey 2504 appears.

If the first two or more observations within a gvkey is a missing rating then the replacement rating should also be a missing rating until a new rating appears. That is the correct treatment of the data.

How do I have the replacement process begin and end within each gvkey? Feel free to edit the code below if you have any suggestions.

Thank you,

Doug

data a;

input gvkey ym rating$;

cards;

001078    24216     .

001078    24217     .

001078    24218     .

001078    24219     BBB

001078    24220     BBB

001078    24221     BBB+

001078    24222     .

001078    24223     .

001078    24224     .

001078    24225     .

001078    24226     B

001078    24227     .

002504    24216     .

002504    24217     .

002504    24218     A

002504    24219     A-

002504    24220     .

002504    24221     .

002504    24222     CC

002504    24223     CC

002504    24224     CC

002504    24225     CC+

002504    24226     .

002504    24227     .

proc sort;

by gvkey ym;

run;

data b;

set a;

retain rating1;

if rating > "." then rating1=rating;

by gvkey; /* this step did not seem to help */

run;

proc print;

var gvkey ym rating rating1;

run;

RESULTS:

The SAS System

Obs

gvkey

ym

rating

rating1

1

1078

24216

← The first three observations in the

2

1078

24217

← rating1 column worked as intended as

3

1078

24218

← they should be missing ratings.

4

1078

24219

BBB

BBB

5

1078

24220

BBB

BBB

6

1078

24221

BBB+

BBB+

7

1078

24222

BBB+

8

1078

24223

BBB+

9

1078

24224

BBB+

10

1078

24225

BBB+

11

1078

24226

B

B

12

1078

24227

B

13

2504

24216

B

← Observations 13 and 14 in the rating1

14

2504

24217

B

← column should still be missing ratings.

15

2504

24218

A

A

16

2504

24219

A-

A-

17

2504

24220

A-

18

2504

24221

A-

19

2504

24222

CC

CC

20

2504

24223

CC

CC

21

2504

24224

CC

CC

22

2504

24225

CC+

CC+

23

2504

24226

CC+

24

2504

24227

CC+

schneiderd · Posted 06-18-2019 09:38 PM

The solution provided below worked! Thank you!

schneiderd · Posted 06-18-2019 09:43 PM

The solution provided worked! Thank you Reeza. This allows me to move forward with a project I am working on.

Replacing missing credit ratings with previous observation within each gvkey

Re: Replacing missing credit ratings with previous observation within each gvkey

Re: Replacing missing credit ratings with previous observation within each gvkey

Re: Replacing missing credit ratings with previous observation within each gvkey

Re: Replacing missing credit ratings with previous observation within each gvkey

Replacing missing credit ratings with previous observation within each gvkey

Re: Replacing missing credit ratings with previous observation within each gvkey

Re: Replacing missing credit ratings with previous observation within each gvkey

Re: Replacing missing credit ratings with previous observation within each gvkey

Re: Replacing missing credit ratings with previous observation within each gvkey

Ready to join fellow brilliant minds for the SAS Hackathon?

Click image to register for webinar

Classroom Training Available!