BookmarkSubscribeRSS Feed
Bear85
Calcite | Level 5

Suppose I have insurance dataset with 2 categorical predictive variables: Gender (F/M) and Credit (A/B/C/D/E)
I also have exposure variable Days that I will use as the weight in PROC HPGENSELECT.

PROC HPGENSELECT data=InputData FCONV=1E-8 MAXITER=100 ITSUMMARY;
     CLASS Gender Credit;  
     MODEL Loss = Gender Credit / dist= Tweedie (p=1.6) link=log;
     WEIGHT Days;
     ODS OUTPUT ParameterEstimates= PEs;
RUN;
PROC PRINT DATA = PEs; RUN;

From 37108 - Setting reference levels for CLASS predictor variables (sas.com) I know that by default the levels are arranged in ascending alphanumeric order -> so M will become the base level for Gender, and E will become the base level for Credit.

However, the prevalent classes using exposure variable Days are Gender = F and Credit = B.

For example, I can use PROC SUMMARY to determine the prevalent class for each predictive variable:

PROC SUMMARY data=InputData SUM PRINT MISSING;
     CLASS Gender;
     VAR Days;
RUN; 

... and then specify the preferred reference levels in the CLASS statement:

PROC HPGENSELECT data=InputData FCONV=1E-8 MAXITER=100 ITSUMMARY;
     CLASS Gender(ref = "F") Credit(ref = "B");  
     MODEL Loss = Gender Credit / dist= Tweedie (p=1.6) link=log;
     WEIGHT Days;
     ODS OUTPUT ParameterEstimates= PEs;
RUN;
PROC PRINT DATA = PEs; RUN;

If I have 10 more categorical predictive variables, is there an elegant way to avoid PROC SUMMARY, pass exposure variable Days to PROC HPGENSELECT, and request PROC HPGENSELECT for each categorical predictive variable use the level with the highest exposure as the base?

Thanks for the insights!

4 REPLIES 4
StatDave
SAS Super FREQ
See the description of the options in the CLASS statement in the GENMOD documentation. You can use specify the ORDER=FREQ and DESCENDING options as global options (following a slash in the CLASS statement) to order the levels by ascending frequency.
Bear85
Calcite | Level 5

Thanks for your response, StatDave! Yes, options ORDER = FREQ and DESCENDING in the CLASS statement CLASS Statement :: SAS/STAT(R) 12.3 User's Guide: High-Performance Procedures would work if I wanted to select the base level using highest frequency of Gender. However, I need to consider 2nd variable - Days - to determine the prevalent class. For example, I'd like "F" to be the base class for Gender because it has higher sum(Days), even though "M" has higher _FREQ_

ObsGender_FREQ_Days
1F     4,000810,000
2M     4,821790,560


In my case, I decided to continue to use the approach from the original post: PROC SUMMARY to determine the prevalent class for each predictive variable, and then specify the preferred reference levels in the CLASS statement. 

sbxkoenk
SAS Super FREQ

Hello @Bear85 ,

 

I see ...

 

Note that you can do all that

    PROC SUMMARY + PROC HPGENSELECT

    with proper base levels for CLASS variables

in ONE GO (without any manual intervention)!

 

You can do that with some macro coding or with data-driven code generation in a data step.

 

Good luck,

Koen

bik01
Calcite | Level 5

The level should match the standard as bik is matching the standard of market as an conversational tool

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 752 views
  • 4 likes
  • 4 in conversation