BookmarkSubscribeRSS Feed
SJ12
Calcite | Level 5

Hi, this is my dataset (example), question is down below:

 

data test;
input id $ foodtype $ kcals Nova $;
datalines;
001 butter 10 1
001 fish 15 2
001 banana 10 3
001 cherry 25 4
002 burger 5 2
002 pumpkin 4 2
002 carot 6 3
003 apple 100 1
003 tahini 60 4
;
run;
 
1. I first wanted to know the consumption of kcals from each Nova class (1 to 4) for each ID. I was able to get it with proc means: 
 

Proc means data =work.test sum;

Class ID Nova;

Var kcals;

output out = kcals3 sum=kcals;

run;

 
2. Then I also wanted the total kcals consumed per ID, which I was able to get through the proc means output or proc sql.
 
3. Now I would want to have the kcals consumed from each Nova class and for each ID divided by their own total consumed per day (percentage). So for each person, the kcals totals (from the 4 Nova classes) is divided by the total kcals consumption of that ID. This is where I'm stuck. How do I use results from proc means or proc sql to calculate those percentages? 
 
4. and then at the end, I would like to have mean of all the percentages (of all IDs) for each Nova class.
 
Hope someone can help! Thanks
9 REPLIES 9
PaigeMiller
Diamond | Level 26

1 & 2 can be combined into this code

 

proc means data=test sum;
    class id nova;
    ways 1;
    var kcals;
    output out=kcals3 sum=;
run;

 

#3 presents a problem. You say "Now I would want to have the kcals consumed from each Nova class and for each ID divided by their own total consumed per day (percentage)" but there is no day variable so we can't get a total and we can't get a percentage. Can you explain further?

 

#4 also is a problem, as I don't even understand what "mean of all the percentages" would do for you, it seems this is a meaningless way to look at things. You should never average percentages — it's meaningless — unless the denominator is the same for each; and I don't think that is the case.

--
Paige Miller
SJ12
Calcite | Level 5

For 1&2, the synthax suggested does not give me the kcals consumed PER ID for each Nova group (see synthax #1 in the question). If I put "ways 2", I get the kcals per ID, but not the sum anymore.

 

For #3, the whole table represents what the different IDs consumed in 1 day, so the fact that it’s “per day” can be ignored for the statistics (sorry about the confusion).

 

What I want is this: 

For example, ID #002 ate 2 foods from Nova 2 class. So I would like to sum up the kcals from that Nova class so from those 2 foods (5 plus 4 for a total of 11 kcals here) divided by the total that person consumed in kcals (which would be 15 calories for ID #002 in this case). So I want to know how much kcals the IDs consumed from each Nova class and divide it by their total consumption (for this example, I would have a percentage of 73% kcals consumption for ID #002 for Nova class 2). Does that make sense?

 

Thanks for your help!  

PaigeMiller
Diamond | Level 26

Assuming you have written this wrong

 

(5 plus 4 for a total of 11 kcals here)

 

this should work:

 

proc means data=test noprint;
    class id nova;
    types id id*nova;
    var kcals;
    output out=kcals3 sum=;
run;

data percentages;
    merge kcals3(where=(_type_=2) drop=nova rename=(kcals=kcals_total)) kcals3(where=(_type_=3));
    by id;
    percentage = kcals/kcals_total;
    drop _type_ _freq_;
    format percentage percent8.2;
run;

 

 

--
Paige Miller
SJ12
Calcite | Level 5

Yes, sorry I mixed up the numbers. That works, thanks so much! 

And then you mentioned (for #4) that I shouldn't do an average of percentages. I wanted to do it just to have the average consumption of each Nova group in percentage (for all IDs). 

 

Here is what I had: 

proc means data=percentages noprint;
class nova;
var percentage;
output out=kcals4 mean=;
run;

 

It seems to work, but do you recommend going a different way about it (in terms of thinking and getting that wanted result)? Thanks! 

PaigeMiller
Diamond | Level 26

@SJ12 wrote:

Yes, sorry I mixed up the numbers. That works, thanks so much! 

And then you mentioned (for #4) that I shouldn't do an average of percentages. I wanted to do it just to have the average consumption of each Nova group in percentage (for all IDs). 

 

Here is what I had: 

proc means data=percentages noprint;
class nova;
var percentage;
output out=kcals4 mean=;
run;


I said it was meaningless. This number represents no real world quantity; your professor or your boss would likely not accept doing it this way.

 

It seems to work, but do you recommend going a different way about it (in terms of thinking and getting that wanted result)? 

 

If you want to come up with a meaningful number you would compute

 

(sum of all the numerators) / (sum of all the denominators)

 

Since you have already seen how to do sums, I leave it for you as homework to figure out how to get the sum of all the numerators and the sum of all the denominators, and then divide.

--
Paige Miller
SJ12
Calcite | Level 5

Ok, that makes sense. Thank you so much for your answers.

SteveDenham
Jade | Level 19

@PaigeMiller , I really wish you could convince toxicologists and especially clinical pathologists about #4. I've spent the last 15 years trying to convince folks that taking the average of several animals' percent change from baseline doesn't result in a number that applies to the population the way they think it does..

 

SteveDenham.

ballardw
Super User

I might try something like:

 

proc tabulate data=test;
   class id nova foodtype;
   var kcals;
   table id all='All IDS',
        (nova all='All cals per id')*
             kcals=' '*(sum rowpctsum)
      / misstext=' '
   ;
run;

The Rowpctsum is the percentage of the row total of KCALS for each id.

The All IDS would give the overall row percentages. Which would be the percentage of each NOVA class overall.

 

Means of percentages are pretty much meaningless unless the numerator/denominators of each ID were similar. Since you have ID with no values in some Nova categories that doesn't hold.

SJ12
Calcite | Level 5

Awesome, this can be really helpful. I'll look into it. Thanks for your help! 

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 9 replies
  • 1208 views
  • 1 like
  • 4 in conversation