Hi everyone,
I have variables charges(numeric), age, region( with 4 levels), and sex (female/male). And I want to look at the distribution of charges which is the response variable and it relationship between (age, region, sex). Do I need to create format for (age, region, sex)? what about the charges variable?
Appreciate your help.
BOX plot ? Try
proc boxplot
OR
proc sgplot;
vbox ......
https://blogs.sas.com/content/iml/2019/03/06/proc-boxplot-hundreds.html
Do you want tables or graphs? Statistical tests of difference? Something else?
How much information do you want to know about the charges for each level of your categories? Some summary statistics like mean, max, min and standard deviation or do you need quantiles? or counts ?
There are lots of ways to look at distributions but it helps to give us a starting point what you want. Which would likely depend on what type of question you need to answer with that information.
Formats for such things as sex likely would only be affecting appearance of output. A format for age might be used to create a group based on ranges such as 5 or 10 year intervals or other range needed, so that might be appropriate.
For the assumed somewhat continuous variable charges a format might be useful but without knowing what questions you want to answer saying yay or nay would be a guess without guidance.
I actually want tables and graph, trying to explore the relationship between charges and ( age,sex,region) and then fit a linear regression for prediction. I have attached the excel file.
BOX plot ? Try
proc boxplot
OR
proc sgplot;
vbox ......
https://blogs.sas.com/content/iml/2019/03/06/proc-boxplot-hundreds.html
Thank you for responding.
I used proc boxplot and it actually worked, but the output was not quite what I was expected after grouping age into category. Is there a way to have count number together in each age categories instead of categories being spread that way. I have attached the age output below.
proc format;
value agef
18-28='<28'
29-39='29-39'
40-50='40-50'
51-61='51-61'
61-71='61>';
run;
*Distribution of charges by age;
proc sort data= insurance out=insurance_sort;
by age ;
run;
proc boxplot data=insurance_sort;
plot charges*age;
format age agef.;
title 'Distribution of charges by age';
run;
I would try:
proc sgplot data=insurance_sort; vbox height /category=age; format age agef.; run;
It worked.Thank you very much.
Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.
Register today!Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Select SAS Training centers are offering in-person courses. View upcoming courses for: