BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Mariloud
Obsidian | Level 7

Hi everyone,

 

I have variables charges(numeric), age, region( with 4 levels), and sex (female/male). And I want to  look at the distribution of charges which is the response variable and it relationship between (age, region, sex).  Do I need to create format for (age, region, sex)? what about the charges variable?

 

Appreciate your help.

1 ACCEPTED SOLUTION
6 REPLIES 6
ballardw
Super User

Do you want tables or graphs? Statistical tests of difference? Something else?

 

How much information do you want to know about the charges for each level of your categories? Some summary statistics like mean, max, min and standard deviation or do you need quantiles? or counts ?

 

There are lots of ways to look at distributions but it helps to give us a starting point what you want. Which would likely depend on what type of question you need to answer with that information.

 

Formats for such things as sex likely would only be affecting appearance of output. A format for age might be used to create a group based on ranges such as 5 or 10 year intervals or other range needed, so that might be appropriate.

 

For the assumed somewhat continuous variable charges a format might be useful but without knowing what questions you want to answer saying yay or nay would be a guess without guidance.

Mariloud
Obsidian | Level 7

I actually want tables and graph, trying to explore the relationship between charges and ( age,sex,region) and then  fit a linear regression for prediction. I have attached the excel file.

Mariloud
Obsidian | Level 7

Thank you for responding.

I used proc boxplot and it actually worked, but the output was not quite what I was expected after grouping age into category. Is there a way to have count number together in each age categories  instead of categories being  spread that way. I have attached the age output below. 

 

 

 

proc format; 
value agef
18-28='<28'
29-39='29-39'
40-50='40-50'
51-61='51-61'
61-71='61>';
run;

*Distribution of charges by age;
proc sort data= insurance out=insurance_sort;
by age ;
run;

proc boxplot data=insurance_sort;
plot charges*age;
format age agef.;
title 'Distribution of charges by age';
run;

 

ballardw
Super User

I would try:

proc sgplot data=insurance_sort;
   vbox height /category=age;
   format age agef.;
run;

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 2279 views
  • 3 likes
  • 3 in conversation