Below is a sample data and SAS program (I use SAS 9.4) that illustrates a challenge I have with processing the replacement of missing credit ratings within each gvkey (identifier) for a range of ym (months within a year). In other words, the procedure to replace missing ratings needs to reset with each individual gvkey.
gvkey = a unique number for each corporation
ym = a unique month and year
rating = credit rating issued for a gvkey at a specific ym
“.” = missing rating for specific ym within a gvkey
Two gvkeys, 001078 and 002504, are used in the data below, though there are over one thousand in the actual study. The ym’s repeat for each gvkey as they represent the months for one year. The coding follows the data and the SAS results follow the coding.
I am trying to replace missing ratings with the last or most recent rating. The program works for the most part as shown in column “rating1.” It worked perfectly for gvkey 001078. However, I need the program to process the replacement within each gvkey and not have the next gvkey read from the previous gvkey.
The first gvkey, 1078, is shown in observations 1 -12. Observation 13 is the start of a new gvkey, 2504.
The problem is that gvkey 2504 is reading the last rating from observation 11 that is from the previous and different gvkey, 1078. The replacement process should start over again for the next gvkey, 2504, and should start by reading observation 13 as the first rating or data line, which is a missing rating, and should report a missing rating in the “rating1” column until observation 15 when a new rating for gvkey 2504 appears.
If the first two or more observations within a gvkey is a missing rating then the replacement rating should also be a missing rating until a new rating appears. That is the correct treatment of the data.
How do I have the replacement process begin and end within each gvkey? Feel free to edit the code below if you have any suggestions.
Thank you,
Doug
data a;
input gvkey ym rating$;
cards;
001078 24216 .
001078 24217 .
001078 24218 .
001078 24219 BBB
001078 24220 BBB
001078 24221 BBB+
001078 24222 .
001078 24223 .
001078 24224 .
001078 24225 .
001078 24226 B
001078 24227 .
002504 24216 .
002504 24217 .
002504 24218 A
002504 24219 A-
002504 24220 .
002504 24221 .
002504 24222 CC
002504 24223 CC
002504 24224 CC
002504 24225 CC+
002504 24226 .
002504 24227 .
proc sort;
by gvkey ym;
run;
data b;
set a;
retain rating1;
if rating > "." then rating1=rating;
by gvkey; /* this step did not seem to help */
run;
proc print;
var gvkey ym rating rating1;
run;
RESULTS:
The SAS System
Obs | gvkey | ym | rating | rating1 |
|
1 | 1078 | 24216 | ← The first three observations in the | ||
2 | 1078 | 24217 | ← rating1 column worked as intended as | ||
3 | 1078 | 24218 | ← they should be missing ratings. | ||
4 | 1078 | 24219 | BBB | BBB |
|
5 | 1078 | 24220 | BBB | BBB |
|
6 | 1078 | 24221 | BBB+ | BBB+ |
|
7 | 1078 | 24222 | BBB+ |
| |
8 | 1078 | 24223 | BBB+ |
| |
9 | 1078 | 24224 | BBB+ |
| |
10 | 1078 | 24225 | BBB+ |
| |
11 | 1078 | 24226 | B | B |
|
12 | 1078 | 24227 | B |
| |
13 | 2504 | 24216 | B | ← Observations 13 and 14 in the rating1 | |
14 | 2504 | 24217 | B | ← column should still be missing ratings. | |
15 | 2504 | 24218 | A | A |
|
16 | 2504 | 24219 | A- | A- |
|
17 | 2504 | 24220 | A- |
| |
18 | 2504 | 24221 | A- |
| |
19 | 2504 | 24222 | CC | CC |
|
20 | 2504 | 24223 | CC | CC |
|
21 | 2504 | 24224 | CC | CC |
|
22 | 2504 | 24225 | CC+ | CC+ |
|
23 | 2504 | 24226 | CC+ |
| |
24 | 2504 | 24227 | CC+ |
|
You're fairly close, but not quite there.
You can use NOT MISSING() to check for missing values, not the >"."
You set up a BY group but never used it, you need to use it to reset it at the first record of each GVKEY so the values are not retained across the GVKEY groups.
data b;
set a;
by gvkey;
retain rating1;
if not missing(rating) then rating1=rating;
*reset for the first gvkey that may be missing;
if first.gvkey and missing(rating) then call missing(rating1);
run;
@schneiderd wrote:
Below is a sample data and SAS program (I use SAS 9.4) that illustrates a challenge I have with processing the replacement of missing credit ratings within each gvkey (identifier) for a range of ym (months within a year). In other words, the procedure to replace missing ratings needs to reset with each individual gvkey.
gvkey = a unique number for each corporation
ym = a unique month and year
rating = credit rating issued for a gvkey at a specific ym
“.” = missing rating for specific ym within a gvkey
Two gvkeys, 001078 and 002504, are used in the data below, though there are over one thousand in the actual study. The ym’s repeat for each gvkey as they represent the months for one year. The coding follows the data and the SAS results follow the coding.
I am trying to replace missing ratings with the last or most recent rating. The program works for the most part as shown in column “rating1.” It worked perfectly for gvkey 001078. However, I need the program to process the replacement within each gvkey and not have the next gvkey read from the previous gvkey.
The first gvkey, 1078, is shown in observations 1 -12. Observation 13 is the start of a new gvkey, 2504.
The problem is that gvkey 2504 is reading the last rating from observation 11 that is from the previous and different gvkey, 1078. The replacement process should start over again for the next gvkey, 2504, and should start by reading observation 13 as the first rating or data line, which is a missing rating, and should report a missing rating in the “rating1” column until observation 15 when a new rating for gvkey 2504 appears.
If the first two or more observations within a gvkey is a missing rating then the replacement rating should also be a missing rating until a new rating appears. That is the correct treatment of the data.
How do I have the replacement process begin and end within each gvkey? Feel free to edit the code below if you have any suggestions.
Thank you,
Doug
data a;
input gvkey ym rating$;
cards;
001078 24216 .
001078 24217 .
001078 24218 .
001078 24219 BBB
001078 24220 BBB
001078 24221 BBB+
001078 24222 .
001078 24223 .
001078 24224 .
001078 24225 .
001078 24226 B
001078 24227 .
002504 24216 .
002504 24217 .
002504 24218 A
002504 24219 A-
002504 24220 .
002504 24221 .
002504 24222 CC
002504 24223 CC
002504 24224 CC
002504 24225 CC+
002504 24226 .
002504 24227 .
proc sort;
by gvkey ym;
run;
data b;
set a;
retain rating1;
if rating > "." then rating1=rating;
by gvkey; /* this step did not seem to help */
run;
proc print;
var gvkey ym rating rating1;
run;
RESULTS:
The SAS System
Obs
gvkey
ym
rating
rating1
1
1078
24216
← The first three observations in the
2
1078
24217
← rating1 column worked as intended as
3
1078
24218
← they should be missing ratings.
4
1078
24219
BBB
BBB
5
1078
24220
BBB
BBB
6
1078
24221
BBB+
BBB+
7
1078
24222
BBB+
8
1078
24223
BBB+
9
1078
24224
BBB+
10
1078
24225
BBB+
11
1078
24226
B
B
12
1078
24227
B
13
2504
24216
B
← Observations 13 and 14 in the rating1
14
2504
24217
B
← column should still be missing ratings.
15
2504
24218
A
A
16
2504
24219
A-
A-
17
2504
24220
A-
18
2504
24221
A-
19
2504
24222
CC
CC
20
2504
24223
CC
CC
21
2504
24224
CC
CC
22
2504
24225
CC+
CC+
23
2504
24226
CC+
24
2504
24227
CC+
You're fairly close, but not quite there.
You can use NOT MISSING() to check for missing values, not the >"."
You set up a BY group but never used it, you need to use it to reset it at the first record of each GVKEY so the values are not retained across the GVKEY groups.
data b;
set a;
by gvkey;
retain rating1;
if not missing(rating) then rating1=rating;
*reset for the first gvkey that may be missing;
if first.gvkey and missing(rating) then call missing(rating1);
run;
@schneiderd wrote:
Below is a sample data and SAS program (I use SAS 9.4) that illustrates a challenge I have with processing the replacement of missing credit ratings within each gvkey (identifier) for a range of ym (months within a year). In other words, the procedure to replace missing ratings needs to reset with each individual gvkey.
gvkey = a unique number for each corporation
ym = a unique month and year
rating = credit rating issued for a gvkey at a specific ym
“.” = missing rating for specific ym within a gvkey
Two gvkeys, 001078 and 002504, are used in the data below, though there are over one thousand in the actual study. The ym’s repeat for each gvkey as they represent the months for one year. The coding follows the data and the SAS results follow the coding.
I am trying to replace missing ratings with the last or most recent rating. The program works for the most part as shown in column “rating1.” It worked perfectly for gvkey 001078. However, I need the program to process the replacement within each gvkey and not have the next gvkey read from the previous gvkey.
The first gvkey, 1078, is shown in observations 1 -12. Observation 13 is the start of a new gvkey, 2504.
The problem is that gvkey 2504 is reading the last rating from observation 11 that is from the previous and different gvkey, 1078. The replacement process should start over again for the next gvkey, 2504, and should start by reading observation 13 as the first rating or data line, which is a missing rating, and should report a missing rating in the “rating1” column until observation 15 when a new rating for gvkey 2504 appears.
If the first two or more observations within a gvkey is a missing rating then the replacement rating should also be a missing rating until a new rating appears. That is the correct treatment of the data.
How do I have the replacement process begin and end within each gvkey? Feel free to edit the code below if you have any suggestions.
Thank you,
Doug
data a;
input gvkey ym rating$;
cards;
001078 24216 .
001078 24217 .
001078 24218 .
001078 24219 BBB
001078 24220 BBB
001078 24221 BBB+
001078 24222 .
001078 24223 .
001078 24224 .
001078 24225 .
001078 24226 B
001078 24227 .
002504 24216 .
002504 24217 .
002504 24218 A
002504 24219 A-
002504 24220 .
002504 24221 .
002504 24222 CC
002504 24223 CC
002504 24224 CC
002504 24225 CC+
002504 24226 .
002504 24227 .
proc sort;
by gvkey ym;
run;
data b;
set a;
retain rating1;
if rating > "." then rating1=rating;
by gvkey; /* this step did not seem to help */
run;
proc print;
var gvkey ym rating rating1;
run;
RESULTS:
The SAS System
Obs
gvkey
ym
rating
rating1
1
1078
24216
← The first three observations in the
2
1078
24217
← rating1 column worked as intended as
3
1078
24218
← they should be missing ratings.
4
1078
24219
BBB
BBB
5
1078
24220
BBB
BBB
6
1078
24221
BBB+
BBB+
7
1078
24222
BBB+
8
1078
24223
BBB+
9
1078
24224
BBB+
10
1078
24225
BBB+
11
1078
24226
B
B
12
1078
24227
B
13
2504
24216
B
← Observations 13 and 14 in the rating1
14
2504
24217
B
← column should still be missing ratings.
15
2504
24218
A
A
16
2504
24219
A-
A-
17
2504
24220
A-
18
2504
24221
A-
19
2504
24222
CC
CC
20
2504
24223
CC
CC
21
2504
24224
CC
CC
22
2504
24225
CC+
CC+
23
2504
24226
CC+
24
2504
24227
CC+
The solution provided below worked! Thank you!
Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.
If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.