Hello, I need to build a preditive model for my data. The variables include: year, gender, race, num_cases, location, ageGroup, disease(1:yes, 0:No), maritual_status, charge. We did not collect race data for the first 4 years (coded 99).
1) How can I deal with the missing race data? (It seems the race is significant.)
2) How can I build a model to predict the probability (disease="yes") for the given variables in the dataset?
Very appreciate for any suggestions and help. Thanks.
data have;
input year Gender$ Race Num_Cases Location gender agegroup Disease maritual_status charge;
datalines;
1 M 99 1 8 1 6 1 5 900
1 M 99 5 3 1 6 1 1 152
1 F 99 3 16 0 7 1 6 588
1 M 99 26 3 1 7 1 2 79
1 M 99 1 16 1 7 1 6 179
1 M 99 1 12 1 5 1 2 100
1 M 99 2 4 1 7 1 1 245
2 M 99 1 3 1 5 1 5 625
2 F 99 2 3 0 5 1 1 35
2 F 99 1 16 0 6 1 2 144
2 F 99 1 3 0 5 0 5 625
2 F 99 1 3 0 6 0 4 576
2 F 99 3 3 0 6 1 4 192
3 M 99 6 3 1 5 0 1 500
3 M 99 1 3 1 7 1 2 196
3 M 99 1 1 1 6 0 1 36
3 M 99 1 3 1 5 0 1 25
4 M 99 1 3 1 5 0 2 100
4 M 99 3 16 1 5 1 2 352
4 F 99 1 16 0 6 1 6 1296
4 F 99 6 11 0 7 0 2 254
5 M 1 1 3 1 5 1 1 25
5 F 2 3 16 0 4 1 2 213
6 F 1 1 2 0 7 1 6 184
6 F 1 1 13 0 7 1 2 196
6 F 1 1 4 0 7 0 1 49
6 M 4 2 3 1 5 0 1 125
6 F 5 33 3 0 6 0 5 80
7 F 4 1 16 0 7 0 6 1764
7 F 4 2 3 0 6 0 6 648
7 M 6 1 16 1 6 1 6 1296
7 F 2 1 2 0 5 0 5 625
7 F 1 24 3 0 5 0 2 452
7 F 1 1 3 0 6 1 1 362
8 M 5 2 10 1 7 1 2 980
8 M 1 5 3 1 4 1 1 350
8 F 1 1 3 0 6 0 99 352
8 M 5 1 3 1 5 0 1 25
8 M 1 1 3 1 7 0 1 49
9 M 1 1 13 1 7 1 5 1225
9 M 5 4 16 1 7 0 1 122
9 F 5 2 3 0 7 1 2 98
9 M 1 1 1 1 7 1 5 126
10 F 1 66 3 0 6 0 1 54
10 F 2 1 1 0 5 0 1 25
10 F 1 2 4 0 6 0 5 450
10 M 1 3 16 1 7 0 5 408
11 F 1 1 8 0 7 1 2 196
11 M 3 1 3 1 7 1 2 196
11 M 5 5 3 1 6 0 1 72
11 M 1 2 3 1 7 1 1 245
12 F 1 9 11 0 7 0 6 196
12 M 5 2 3 1 5 0 1 125
12 F 5 13 3 0 7 0 6 150
12 M 0 2 3 1 7 0 2 98
13 F 5 3 3 0 5 0 1 215
13 M 0 4 3 1 5 0 1 625
13 M 2 25 3 1 7 1 2 784
13 M 1 1 1 1 7 0 99 480
13 M 2 27 3 1 7 0 2 725
;
1) PROC MI 2) PROC LOGISTIC or PROC HPSPLIT or PROC GENMOD
Thanks so much. I will try the procedures that you mentioned here.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.