Hi, this is my dataset (example), question is down below:
Proc means data =work.test sum;
Class ID Nova;
Var kcals;
output out = kcals3 sum=kcals;
run;
1 & 2 can be combined into this code
proc means data=test sum;
class id nova;
ways 1;
var kcals;
output out=kcals3 sum=;
run;
#3 presents a problem. You say "Now I would want to have the kcals consumed from each Nova class and for each ID divided by their own total consumed per day (percentage)" but there is no day variable so we can't get a total and we can't get a percentage. Can you explain further?
#4 also is a problem, as I don't even understand what "mean of all the percentages" would do for you, it seems this is a meaningless way to look at things. You should never average percentages — it's meaningless — unless the denominator is the same for each; and I don't think that is the case.
For 1&2, the synthax suggested does not give me the kcals consumed PER ID for each Nova group (see synthax #1 in the question). If I put "ways 2", I get the kcals per ID, but not the sum anymore.
For #3, the whole table represents what the different IDs consumed in 1 day, so the fact that it’s “per day” can be ignored for the statistics (sorry about the confusion).
What I want is this:
For example, ID #002 ate 2 foods from Nova 2 class. So I would like to sum up the kcals from that Nova class so from those 2 foods (5 plus 4 for a total of 11 kcals here) divided by the total that person consumed in kcals (which would be 15 calories for ID #002 in this case). So I want to know how much kcals the IDs consumed from each Nova class and divide it by their total consumption (for this example, I would have a percentage of 73% kcals consumption for ID #002 for Nova class 2). Does that make sense?
Thanks for your help!
Assuming you have written this wrong
(5 plus 4 for a total of 11 kcals here)
this should work:
proc means data=test noprint;
class id nova;
types id id*nova;
var kcals;
output out=kcals3 sum=;
run;
data percentages;
merge kcals3(where=(_type_=2) drop=nova rename=(kcals=kcals_total)) kcals3(where=(_type_=3));
by id;
percentage = kcals/kcals_total;
drop _type_ _freq_;
format percentage percent8.2;
run;
Yes, sorry I mixed up the numbers. That works, thanks so much!
And then you mentioned (for #4) that I shouldn't do an average of percentages. I wanted to do it just to have the average consumption of each Nova group in percentage (for all IDs).
Here is what I had:
proc means data=percentages noprint;
class nova;
var percentage;
output out=kcals4 mean=;
run;
It seems to work, but do you recommend going a different way about it (in terms of thinking and getting that wanted result)? Thanks!
@SJ12 wrote:
Yes, sorry I mixed up the numbers. That works, thanks so much!
And then you mentioned (for #4) that I shouldn't do an average of percentages. I wanted to do it just to have the average consumption of each Nova group in percentage (for all IDs).
Here is what I had:
proc means data=percentages noprint;
class nova;
var percentage;
output out=kcals4 mean=;
run;
I said it was meaningless. This number represents no real world quantity; your professor or your boss would likely not accept doing it this way.
It seems to work, but do you recommend going a different way about it (in terms of thinking and getting that wanted result)?
If you want to come up with a meaningful number you would compute
(sum of all the numerators) / (sum of all the denominators)
Since you have already seen how to do sums, I leave it for you as homework to figure out how to get the sum of all the numerators and the sum of all the denominators, and then divide.
Ok, that makes sense. Thank you so much for your answers.
@PaigeMiller , I really wish you could convince toxicologists and especially clinical pathologists about #4. I've spent the last 15 years trying to convince folks that taking the average of several animals' percent change from baseline doesn't result in a number that applies to the population the way they think it does..
SteveDenham.
I might try something like:
proc tabulate data=test; class id nova foodtype; var kcals; table id all='All IDS', (nova all='All cals per id')* kcals=' '*(sum rowpctsum) / misstext=' ' ; run;
The Rowpctsum is the percentage of the row total of KCALS for each id.
The All IDS would give the overall row percentages. Which would be the percentage of each NOVA class overall.
Means of percentages are pretty much meaningless unless the numerator/denominators of each ID were similar. Since you have ID with no values in some Nova categories that doesn't hold.
Awesome, this can be really helpful. I'll look into it. Thanks for your help!
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.