BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
schneiderd
Calcite | Level 5

 

 

Below is a sample data and SAS program (I use SAS 9.4) that illustrates a challenge I have with processing the replacement of missing credit ratings within each gvkey (identifier) for a range of ym (months within a year). In other words, the procedure to replace missing ratings needs to reset with each individual gvkey.

 

gvkey = a unique number for each corporation

ym = a unique month and year

rating = credit rating issued for a gvkey at a specific ym

“.” = missing rating for specific ym within a gvkey

 

Two gvkeys, 001078 and 002504, are used in the data below, though there are over one thousand in the actual study. The ym’s repeat for each gvkey as they represent the months for one year. The coding follows the data and the SAS results follow the coding.

 

I am trying to replace missing ratings with the last or most recent rating. The program works for the most part as shown in column “rating1.” It worked perfectly for gvkey 001078. However, I need the program to process the replacement within each gvkey and not have the next gvkey read from the previous gvkey.

 

The first gvkey, 1078, is shown in observations 1 -12. Observation 13 is the start of a new gvkey, 2504.

 

The problem is that gvkey 2504 is reading the last rating from observation 11 that is from the previous and different gvkey, 1078. The replacement process should start over again for the next gvkey, 2504, and should start by reading observation 13 as the first rating or data line, which is a missing rating, and should report a missing rating in the “rating1” column until observation 15 when a new rating for gvkey 2504 appears.

 

If the first two or more observations within a gvkey is a missing rating then the replacement rating should also be a missing rating until a new rating appears. That is the correct treatment of the data.

 

How do I have the replacement process begin and end within each gvkey? Feel free to edit the code below if you have any suggestions.

 

Thank you,

 

Doug

 

 

data a;

input gvkey ym rating$;

cards;

 

001078    24216     .

001078    24217     .

001078    24218     .

001078    24219     BBB

001078    24220     BBB

001078    24221     BBB+

001078    24222     .

001078    24223     .

001078    24224     .

001078    24225     .

001078    24226     B

001078    24227     .

002504    24216     .

002504    24217     .

002504    24218     A

002504    24219     A-

002504    24220     .

002504    24221     .

002504    24222     CC

002504    24223     CC

002504    24224     CC

002504    24225     CC+

002504    24226     .

002504    24227     .

 

proc sort;

by gvkey ym;

run;

 

data b;

set a;

retain rating1;

if rating > "." then rating1=rating;

by gvkey; /* this step did not seem to help */

run;

 

proc print;

var gvkey ym rating rating1;

run;

 

 

RESULTS:

The SAS System

 

Obs

gvkey

ym

rating

rating1

 

1

1078

24216

  

← The first three observations in the

2

1078

24217

  

← rating1 column worked as intended as

3

1078

24218

  

← they should be missing ratings.

4

1078

24219

BBB

BBB

 

5

1078

24220

BBB

BBB

 

6

1078

24221

BBB+

BBB+

 

7

1078

24222

 

BBB+

 

8

1078

24223

 

BBB+

 

9

1078

24224

 

BBB+

 

10

1078

24225

 

BBB+

 

11

1078

24226

B

B

 

12

1078

24227

 

B

 

13

2504

24216

 

B

← Observations 13 and 14 in the rating1

14

2504

24217

 

B

← column should still be missing ratings.

15

2504

24218

A

A

 

16

2504

24219

A-

A-

 

17

2504

24220

 

A-

 

18

2504

24221

 

A-

 

19

2504

24222

CC

CC

 

20

2504

24223

CC

CC

 

21

2504

24224

CC

CC

 

22

2504

24225

CC+

CC+

 

23

2504

24226

 

CC+

 

24

2504

24227

 

CC+

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User

You're fairly close, but not quite there.

You can use NOT MISSING() to check for missing values, not the >"."

You set up a BY group but never used it, you need to use it to reset it at the first record of each GVKEY so the values are not retained across the GVKEY groups. 

 

data b;
set a;
by gvkey;

retain rating1;

if not missing(rating) then rating1=rating;

*reset for the first gvkey that may be missing;
if first.gvkey and missing(rating) then call missing(rating1);



run;

@schneiderd wrote:

 

 

Below is a sample data and SAS program (I use SAS 9.4) that illustrates a challenge I have with processing the replacement of missing credit ratings within each gvkey (identifier) for a range of ym (months within a year). In other words, the procedure to replace missing ratings needs to reset with each individual gvkey.

 

gvkey = a unique number for each corporation

ym = a unique month and year

rating = credit rating issued for a gvkey at a specific ym

“.” = missing rating for specific ym within a gvkey

 

Two gvkeys, 001078 and 002504, are used in the data below, though there are over one thousand in the actual study. The ym’s repeat for each gvkey as they represent the months for one year. The coding follows the data and the SAS results follow the coding.

 

I am trying to replace missing ratings with the last or most recent rating. The program works for the most part as shown in column “rating1.” It worked perfectly for gvkey 001078. However, I need the program to process the replacement within each gvkey and not have the next gvkey read from the previous gvkey.

 

The first gvkey, 1078, is shown in observations 1 -12. Observation 13 is the start of a new gvkey, 2504.

 

The problem is that gvkey 2504 is reading the last rating from observation 11 that is from the previous and different gvkey, 1078. The replacement process should start over again for the next gvkey, 2504, and should start by reading observation 13 as the first rating or data line, which is a missing rating, and should report a missing rating in the “rating1” column until observation 15 when a new rating for gvkey 2504 appears.

 

If the first two or more observations within a gvkey is a missing rating then the replacement rating should also be a missing rating until a new rating appears. That is the correct treatment of the data.

 

How do I have the replacement process begin and end within each gvkey? Feel free to edit the code below if you have any suggestions.

 

Thank you,

 

Doug

 

 

data a;

input gvkey ym rating$;

cards;

 

001078    24216     .

001078    24217     .

001078    24218     .

001078    24219     BBB

001078    24220     BBB

001078    24221     BBB+

001078    24222     .

001078    24223     .

001078    24224     .

001078    24225     .

001078    24226     B

001078    24227     .

002504    24216     .

002504    24217     .

002504    24218     A

002504    24219     A-

002504    24220     .

002504    24221     .

002504    24222     CC

002504    24223     CC

002504    24224     CC

002504    24225     CC+

002504    24226     .

002504    24227     .

 

proc sort;

by gvkey ym;

run;

 

data b;

set a;

retain rating1;

if rating > "." then rating1=rating;

by gvkey; /* this step did not seem to help */

run;

 

proc print;

var gvkey ym rating rating1;

run;

 

 

RESULTS:

The SAS System

 

Obs

gvkey

ym

rating

rating1

 

1

1078

24216

   

← The first three observations in the

2

1078

24217

   

← rating1 column worked as intended as

3

1078

24218

   

← they should be missing ratings.

4

1078

24219

BBB

BBB

 

5

1078

24220

BBB

BBB

 

6

1078

24221

BBB+

BBB+

 

7

1078

24222

 

BBB+

 

8

1078

24223

 

BBB+

 

9

1078

24224

 

BBB+

 

10

1078

24225

 

BBB+

 

11

1078

24226

B

B

 

12

1078

24227

 

B

 

13

2504

24216

 

B

← Observations 13 and 14 in the rating1

14

2504

24217

 

B

← column should still be missing ratings.

15

2504

24218

A

A

 

16

2504

24219

A-

A-

 

17

2504

24220

 

A-

 

18

2504

24221

 

A-

 

19

2504

24222

CC

CC

 

20

2504

24223

CC

CC

 

21

2504

24224

CC

CC

 

22

2504

24225

CC+

CC+

 

23

2504

24226

 

CC+

 

24

2504

24227

 

CC+

 

 

 


 

View solution in original post

3 REPLIES 3
Reeza
Super User

You're fairly close, but not quite there.

You can use NOT MISSING() to check for missing values, not the >"."

You set up a BY group but never used it, you need to use it to reset it at the first record of each GVKEY so the values are not retained across the GVKEY groups. 

 

data b;
set a;
by gvkey;

retain rating1;

if not missing(rating) then rating1=rating;

*reset for the first gvkey that may be missing;
if first.gvkey and missing(rating) then call missing(rating1);



run;

@schneiderd wrote:

 

 

Below is a sample data and SAS program (I use SAS 9.4) that illustrates a challenge I have with processing the replacement of missing credit ratings within each gvkey (identifier) for a range of ym (months within a year). In other words, the procedure to replace missing ratings needs to reset with each individual gvkey.

 

gvkey = a unique number for each corporation

ym = a unique month and year

rating = credit rating issued for a gvkey at a specific ym

“.” = missing rating for specific ym within a gvkey

 

Two gvkeys, 001078 and 002504, are used in the data below, though there are over one thousand in the actual study. The ym’s repeat for each gvkey as they represent the months for one year. The coding follows the data and the SAS results follow the coding.

 

I am trying to replace missing ratings with the last or most recent rating. The program works for the most part as shown in column “rating1.” It worked perfectly for gvkey 001078. However, I need the program to process the replacement within each gvkey and not have the next gvkey read from the previous gvkey.

 

The first gvkey, 1078, is shown in observations 1 -12. Observation 13 is the start of a new gvkey, 2504.

 

The problem is that gvkey 2504 is reading the last rating from observation 11 that is from the previous and different gvkey, 1078. The replacement process should start over again for the next gvkey, 2504, and should start by reading observation 13 as the first rating or data line, which is a missing rating, and should report a missing rating in the “rating1” column until observation 15 when a new rating for gvkey 2504 appears.

 

If the first two or more observations within a gvkey is a missing rating then the replacement rating should also be a missing rating until a new rating appears. That is the correct treatment of the data.

 

How do I have the replacement process begin and end within each gvkey? Feel free to edit the code below if you have any suggestions.

 

Thank you,

 

Doug

 

 

data a;

input gvkey ym rating$;

cards;

 

001078    24216     .

001078    24217     .

001078    24218     .

001078    24219     BBB

001078    24220     BBB

001078    24221     BBB+

001078    24222     .

001078    24223     .

001078    24224     .

001078    24225     .

001078    24226     B

001078    24227     .

002504    24216     .

002504    24217     .

002504    24218     A

002504    24219     A-

002504    24220     .

002504    24221     .

002504    24222     CC

002504    24223     CC

002504    24224     CC

002504    24225     CC+

002504    24226     .

002504    24227     .

 

proc sort;

by gvkey ym;

run;

 

data b;

set a;

retain rating1;

if rating > "." then rating1=rating;

by gvkey; /* this step did not seem to help */

run;

 

proc print;

var gvkey ym rating rating1;

run;

 

 

RESULTS:

The SAS System

 

Obs

gvkey

ym

rating

rating1

 

1

1078

24216

   

← The first three observations in the

2

1078

24217

   

← rating1 column worked as intended as

3

1078

24218

   

← they should be missing ratings.

4

1078

24219

BBB

BBB

 

5

1078

24220

BBB

BBB

 

6

1078

24221

BBB+

BBB+

 

7

1078

24222

 

BBB+

 

8

1078

24223

 

BBB+

 

9

1078

24224

 

BBB+

 

10

1078

24225

 

BBB+

 

11

1078

24226

B

B

 

12

1078

24227

 

B

 

13

2504

24216

 

B

← Observations 13 and 14 in the rating1

14

2504

24217

 

B

← column should still be missing ratings.

15

2504

24218

A

A

 

16

2504

24219

A-

A-

 

17

2504

24220

 

A-

 

18

2504

24221

 

A-

 

19

2504

24222

CC

CC

 

20

2504

24223

CC

CC

 

21

2504

24224

CC

CC

 

22

2504

24225

CC+

CC+

 

23

2504

24226

 

CC+

 

24

2504

24227

 

CC+

 

 

 


 

schneiderd
Calcite | Level 5

The solution provided below worked! Thank you! 

schneiderd
Calcite | Level 5
The solution provided worked! Thank you Reeza. This allows me to move forward with a project I am working on.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 473 views
  • 0 likes
  • 2 in conversation