## Descriptive stats help

Hi, this is my dataset (example), question is down below:

data test;
input id \$ foodtype \$ kcals Nova \$;
datalines;
001 butter 10 1
001 fish 15 2
001 banana 10 3
001 cherry 25 4
002 burger 5 2
002 pumpkin 4 2
002 carot 6 3
003 apple 100 1
003 tahini 60 4
;
run;

1. I first wanted to know the consumption of kcals from each Nova class (1 to 4) for each ID. I was able to get it with proc means:

Proc means data =work.test sum;

Class ID Nova;

Var kcals;

output out = kcals3 sum=kcals;

run;

2. Then I also wanted the total kcals consumed per ID, which I was able to get through the proc means output or proc sql.

3. Now I would want to have the kcals consumed from each Nova class and for each ID divided by their own total consumed per day (percentage). So for each person, the kcals totals (from the 4 Nova classes) is divided by the total kcals consumption of that ID. This is where I'm stuck. How do I use results from proc means or proc sql to calculate those percentages?

4. and then at the end, I would like to have mean of all the percentages (of all IDs) for each Nova class.

Hope someone can help! Thanks
9 REPLIES 9

## Re: Descriptive stats help

1 & 2 can be combined into this code

``````proc means data=test sum;
class id nova;
ways 1;
var kcals;
output out=kcals3 sum=;
run;``````

#3 presents a problem. You say "Now I would want to have the kcals consumed from each Nova class and for each ID divided by their own total consumed per day (percentage)" but there is no day variable so we can't get a total and we can't get a percentage. Can you explain further?

#4 also is a problem, as I don't even understand what "mean of all the percentages" would do for you, it seems this is a meaningless way to look at things. You should never average percentages — it's meaningless — unless the denominator is the same for each; and I don't think that is the case.

--
Paige Miller

## Re: Descriptive stats help

For 1&2, the synthax suggested does not give me the kcals consumed PER ID for each Nova group (see synthax #1 in the question). If I put "ways 2", I get the kcals per ID, but not the sum anymore.

For #3, the whole table represents what the different IDs consumed in 1 day, so the fact that it’s “per day” can be ignored for the statistics (sorry about the confusion).

What I want is this:

For example, ID #002 ate 2 foods from Nova 2 class. So I would like to sum up the kcals from that Nova class so from those 2 foods (5 plus 4 for a total of 11 kcals here) divided by the total that person consumed in kcals (which would be 15 calories for ID #002 in this case). So I want to know how much kcals the IDs consumed from each Nova class and divide it by their total consumption (for this example, I would have a percentage of 73% kcals consumption for ID #002 for Nova class 2). Does that make sense?

## Re: Descriptive stats help

Assuming you have written this wrong

(5 plus 4 for a total of 11 kcals here)

this should work:

``````proc means data=test noprint;
class id nova;
types id id*nova;
var kcals;
output out=kcals3 sum=;
run;

data percentages;
merge kcals3(where=(_type_=2) drop=nova rename=(kcals=kcals_total)) kcals3(where=(_type_=3));
by id;
percentage = kcals/kcals_total;
drop _type_ _freq_;
format percentage percent8.2;
run;``````

--
Paige Miller

## Re: Descriptive stats help

Yes, sorry I mixed up the numbers. That works, thanks so much!

And then you mentioned (for #4) that I shouldn't do an average of percentages. I wanted to do it just to have the average consumption of each Nova group in percentage (for all IDs).

proc means data=percentages noprint;
class nova;
var percentage;
output out=kcals4 mean=;
run;

It seems to work, but do you recommend going a different way about it (in terms of thinking and getting that wanted result)? Thanks!

## Re: Descriptive stats help

@SJ12 wrote:

Yes, sorry I mixed up the numbers. That works, thanks so much!

And then you mentioned (for #4) that I shouldn't do an average of percentages. I wanted to do it just to have the average consumption of each Nova group in percentage (for all IDs).

proc means data=percentages noprint;
class nova;
var percentage;
output out=kcals4 mean=;
run;

I said it was meaningless. This number represents no real world quantity; your professor or your boss would likely not accept doing it this way.

It seems to work, but do you recommend going a different way about it (in terms of thinking and getting that wanted result)?

If you want to come up with a meaningful number you would compute

(sum of all the numerators) / (sum of all the denominators)

Since you have already seen how to do sums, I leave it for you as homework to figure out how to get the sum of all the numerators and the sum of all the denominators, and then divide.

--
Paige Miller

## Re: Descriptive stats help  SteveDenham

## Re: Descriptive stats help

@PaigeMiller , I really wish you could convince toxicologists and especially clinical pathologists about #4. I've spent the last 15 years trying to convince folks that taking the average of several animals' percent change from baseline doesn't result in a number that applies to the population the way they think it does..

SteveDenham.

## Re: Descriptive stats help

I might try something like:

```

proc tabulate data=test;
class id nova foodtype;
var kcals;
table id all='All IDS',
(nova all='All cals per id')*
kcals=' '*(sum rowpctsum)
/ misstext=' '
;
run;```

The Rowpctsum is the percentage of the row total of KCALS for each id.

The All IDS would give the overall row percentages. Which would be the percentage of each NOVA class overall.

Means of percentages are pretty much meaningless unless the numerator/denominators of each ID were similar. Since you have ID with no values in some Nova categories that doesn't hold.

## Re: Descriptive stats help

Awesome, this can be really helpful. I'll look into it. Thanks for your help!

Discussion stats
• 9 replies
• 332 views
• 1 like
• 4 in conversation