BookmarkSubscribeRSS Feed
Yurie
Fluorite | Level 6

Hello, I have a dataset that I need to build a model to predict the # of cases and the charges for year 18 or 20. (We have 13 years data in my sample) 

The variables include: year, Gender, Race, Num_Cases, Location, gender, agegroup, Disease, maritual_status, and charge. 

 

1) How can I build the model to predict the charge for the year 18 or 20? (missing value has been coded as 99.)

1) How can I build the model to prediction the number of cases for the year 18 or 20?  

 

Below is my data sample. I will be very thankful for any suggestions or hints. 

 

data have;
input year GenderID$ Race Num_Cases Location gender agegroup Disease maritual_status charge;
datalines;
1 M 99 1 8 1 6 1 5 900
1 M 99 5 3 1 6 1 1 152
1 F 99 3 16 0 7 1 6 588
1 M 99 26 3 1 7 1 2 79
1 M 99 1 16 1 7 1 6 179
1 M 99 1 12 1 5 1 2 100
1 M 99 2 4 1 7 1 1 245
2 M 99 1 3 1 5 1 5 625
2 F 99 2 3 0 5 1 1 35
2 F 99 1 16 0 6 1 2 144
2 F 99 1 3 0 5 0 5 625
2 F 99 1 3 0 6 0 4 576
2 F 99 3 3 0 6 1 4 192
3 M 99 6 3 1 5 0 1 500
3 M 99 1 3 1 7 1 2 196
3 M 99 1 1 1 6 0 1 36
3 M 99 1 3 1 5 0 1 25
4 M 99 1 3 1 5 0 2 100
4 M 99 3 16 1 5 1 2 352
4 F 99 1 16 0 6 1 6 1296
4 F 99 6 11 0 7 0 2 254
5 M 1 1 3 1 5 1 1 25
5 F 2 3 16 0 4 1 2 213
6 F 1 1 2 0 7 1 6 184
6 F 1 1 13 0 7 1 2 196
6 F 1 1 4 0 7 0 1 49
6 M 4 2 3 1 5 0 1 125
6 F 5 33 3 0 6 0 5 80
7 F 4 1 16 0 7 0 6 1764
7 F 4 2 3 0 6 0 6 648
7 M 6 1 16 1 6 1 6 1296
7 F 2 1 2 0 5 0 5 625
7 F 1 24 3 0 5 0 2 452
7 F 1 1 3 0 6 1 1 362
8 M 5 2 10 1 7 1 2 980
8 M 1 5 3 1 4 1 1 350
8 F 1 1 3 0 6 0 99 352
8 M 5 1 3 1 5 0 1 25
8 M 1 1 3 1 7 0 1 49
9 M 1 1 13 1 7 1 5 1225
9 M 5 4 16 1 7 0 1 122
9 F 5 2 3 0 7 1 2 98
9 M 1 1 1 1 7 1 5 126
10 F 1 66 3 0 6 0 1 54
10 F 2 1 1 0 5 0 1 25
10 F 1 2 4 0 6 0 5 450
10 M 1 3 16 1 7 0 5 408
11 F 1 1 8 0 7 1 2 196
11 M 3 1 3 1 7 1 2 196
11 M 5 5 3 1 6 0 1 72
11 M 1 2 3 1 7 1 1 245
12 F 1 9 11 0 7 0 6 196
12 M 5 2 3 1 5 0 1 125
12 F 5 13 3 0 7 0 6 150
12 M 0 2 3 1 7 0 2 98
13 F 5 3 3 0 5 0 1 215
13 M 0 4 3 1 5 0 1 625
13 M 2 25 3 1 7 1 2 784
13 M 1 1 1 1 7 0 99 480
13 M 2 27 3 1 7 0 2 725
;

 

3 REPLIES 3
ballardw
Super User

Never code "missing" values to 99 with SAS. The value will be used in any procedure. SAS has a special missing on purpose for that reason.

 

What type of model have you been attempting? Please show the code.

There are a couple of ways to get estimates for results of a model with given parameters. This is sometimes called Scoring where you create a special output data set from your procedure that has the parameter values and then you use that with a data set containing the values you want to run the model for.

 

Another is to include combinatons of your independent variables of interest in your input data set with missing values for the depend varaible. Most of the modeling procedures have an one or more options for adding the predicted values to your input data set. The details vary somewhat between procedures.

Yurie
Fluorite | Level 6

Thank you. I changed the missing codes back to missing. 

 

1) I tested my respons (charge), and it is gamma distribution. 

 

Below is my code for the model.

proc genmod data=have;

    class ;

    model charge=year gender race ageGroup/dist=gamma link=log lrci;

run; 

I did not include marital_status in the model since the p-value greater than 0.05 (I use alpah=0.05). By applying the code into my real data, I have Pr>ChiSq for the intercept 0.8109. I am not sure what shall I do with the intercept since the p-value is so big. 

 

You mentioned scoring method. I am wondering shall I use logistic regression for my model and test all the selection methods: forward, backward, stepwise, and score? 

 

2) I also need to predict the total charges by year. I am thinking to use linear regression to build the model by proc reg with model charge=year. Thanks for any suggestions or hints. 

 

Ksharp
Super User
It is more like a Timer Series Analysis. Better post it at Forecasting Forum.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 1321 views
  • 2 likes
  • 3 in conversation