Hi guys. How can I determine the mean and average for fields of the Total Cholesterol and Glucose of an individual based on age and gender of an individual from the diabetes dataset? Should I sort it first? Any help and advice will be appreciated. Thanks
Diabetes File:
wouldn't you want to categorize the age first, since it is a numeric variable
convert it into a character variable. You could get the means from proc means as below. This is an untested code so please change the variable names as per your dataset. The autoname will create the mean variable names by concatenating the variable name with statistics name example chol_mean glucose_mean
proc means data=have nway;
class age gender;
var chol glucose;
output out=means mean=/autoname;
run;
You could apply a format to the AGE variable to group it, as explained by @Jim_G in this thread:
Then you can use PROC MEANS or PROC SUMMARY on the data. If you use PROC SUMMARY with a CLASS statement, no sorting is necessary.
By the way, the "mean" is the same thing as the "average". 😉
If you're using EG and the tasks you can use the Summary Task.
Add age and gender into the GROUP variables and Cholesterol into the analysis variables.
If you're looking for code here's some examples:
https://github.com/statgeek/SAS-Tutorials/blob/master/add_average_value_to_dataset
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.
Find more tutorials on the SAS Users YouTube channel.