turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Analytics
- /
- Stat Procs
- /
- How to predict the # of cases and charges in the f...

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

11-08-2016 04:22 PM

Hello, I have a dataset that I need to build a model to predict the # of cases and the charges for year 18 or 20. (We have 13 years data in my sample)

The variables include: year, Gender, Race, Num_Cases, Location, gender, agegroup, Disease, maritual_status, and charge.

1) How can I build the model to predict the charge for the year 18 or 20? (missing value has been coded as 99.)

1) How can I build the model to prediction the number of cases for the year 18 or 20?

Below is my data sample. I will be very thankful for any suggestions or hints.

data have;

input year GenderID$ Race Num_Cases Location gender agegroup Disease maritual_status charge;

datalines;

1 M 99 1 8 1 6 1 5 900

1 M 99 5 3 1 6 1 1 152

1 F 99 3 16 0 7 1 6 588

1 M 99 26 3 1 7 1 2 79

1 M 99 1 16 1 7 1 6 179

1 M 99 1 12 1 5 1 2 100

1 M 99 2 4 1 7 1 1 245

2 M 99 1 3 1 5 1 5 625

2 F 99 2 3 0 5 1 1 35

2 F 99 1 16 0 6 1 2 144

2 F 99 1 3 0 5 0 5 625

2 F 99 1 3 0 6 0 4 576

2 F 99 3 3 0 6 1 4 192

3 M 99 6 3 1 5 0 1 500

3 M 99 1 3 1 7 1 2 196

3 M 99 1 1 1 6 0 1 36

3 M 99 1 3 1 5 0 1 25

4 M 99 1 3 1 5 0 2 100

4 M 99 3 16 1 5 1 2 352

4 F 99 1 16 0 6 1 6 1296

4 F 99 6 11 0 7 0 2 254

5 M 1 1 3 1 5 1 1 25

5 F 2 3 16 0 4 1 2 213

6 F 1 1 2 0 7 1 6 184

6 F 1 1 13 0 7 1 2 196

6 F 1 1 4 0 7 0 1 49

6 M 4 2 3 1 5 0 1 125

6 F 5 33 3 0 6 0 5 80

7 F 4 1 16 0 7 0 6 1764

7 F 4 2 3 0 6 0 6 648

7 M 6 1 16 1 6 1 6 1296

7 F 2 1 2 0 5 0 5 625

7 F 1 24 3 0 5 0 2 452

7 F 1 1 3 0 6 1 1 362

8 M 5 2 10 1 7 1 2 980

8 M 1 5 3 1 4 1 1 350

8 F 1 1 3 0 6 0 99 352

8 M 5 1 3 1 5 0 1 25

8 M 1 1 3 1 7 0 1 49

9 M 1 1 13 1 7 1 5 1225

9 M 5 4 16 1 7 0 1 122

9 F 5 2 3 0 7 1 2 98

9 M 1 1 1 1 7 1 5 126

10 F 1 66 3 0 6 0 1 54

10 F 2 1 1 0 5 0 1 25

10 F 1 2 4 0 6 0 5 450

10 M 1 3 16 1 7 0 5 408

11 F 1 1 8 0 7 1 2 196

11 M 3 1 3 1 7 1 2 196

11 M 5 5 3 1 6 0 1 72

11 M 1 2 3 1 7 1 1 245

12 F 1 9 11 0 7 0 6 196

12 M 5 2 3 1 5 0 1 125

12 F 5 13 3 0 7 0 6 150

12 M 0 2 3 1 7 0 2 98

13 F 5 3 3 0 5 0 1 215

13 M 0 4 3 1 5 0 1 625

13 M 2 25 3 1 7 1 2 784

13 M 1 1 1 1 7 0 99 480

13 M 2 27 3 1 7 0 2 725

;

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

11-08-2016 05:00 PM

*Never *code "missing" values to 99 with SAS. The value will be used in any procedure. SAS has a special missing on purpose for that reason.

What type of model have you been attempting? Please show the code.

There are a couple of ways to get estimates for results of a model with given parameters. This is sometimes called Scoring where you create a special output data set from your procedure that has the parameter values and then you use that with a data set containing the values you want to run the model for.

Another is to include combinatons of your independent variables of interest in your input data set with missing values for the depend varaible. Most of the modeling procedures have an one or more options for adding the predicted values to your input data set. The details vary somewhat between procedures.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

11-09-2016 12:52 PM

Thank you. I changed the missing codes back to missing.

1) I tested my respons (charge), and it is gamma distribution.

Below is my code for the model.

proc genmod data=have;

class ;

model charge=year gender race ageGroup/dist=gamma link=log lrci;

run;

I did not include marital_status in the model since the p-value greater than 0.05 (I use alpah=0.05). By applying the code into my real data, I have Pr>ChiSq for the intercept 0.8109. I am not sure what shall I do with the intercept since the p-value is so big.

You mentioned scoring method. I am wondering shall I use logistic regression for my model and test all the selection methods: forward, backward, stepwise, and score?

2) I also need to predict the total charges by year. I am thinking to use linear regression to build the model by proc reg with model charge=year. Thanks for any suggestions or hints.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

11-09-2016 01:13 AM

It is more like a Timer Series Analysis. Better post it at Forecasting Forum.