Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Home
- /
- Programming
- /
- SAS Procedures
- /
- Descriptive stats help

Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Posted 10-18-2022 01:52 PM
(331 views)

Hi, this is my dataset (example), question is down below:

data test;

input id $ foodtype $ kcals Nova $;

datalines;

001 butter 10 1

001 fish 15 2

001 banana 10 3

001 cherry 25 4

002 burger 5 2

002 pumpkin 4 2

002 carot 6 3

003 apple 100 1

003 tahini 60 4

;

run;

1. I first wanted to know the consumption of kcals from each Nova class (1 to 4) for each ID. I was able to get it with proc means:

Proc means data =work.test sum;

Class ID Nova;

Var kcals;

output out = kcals3 sum=kcals;

run;

2. Then I also wanted the total kcals consumed per ID, which I was able to get through the proc means output or proc sql.

3. Now I would want to have the kcals consumed from each Nova class and for each ID divided by their own total consumed per day (percentage). So for each person, the kcals totals (from the 4 Nova classes) is divided by the total kcals consumption of that ID. This is where I'm stuck. How do I use results from proc means or proc sql to calculate those percentages?

4. and then at the end, I would like to have mean of all the percentages (of all IDs) for each Nova class.

Hope someone can help! Thanks

9 REPLIES 9

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

1 & 2 can be combined into this code

```
proc means data=test sum;
class id nova;
ways 1;
var kcals;
output out=kcals3 sum=;
run;
```

#3 presents a problem. You say "Now I would want to have the kcals consumed from each Nova class and for each ID divided by their own total consumed per day (percentage)" but there is no day variable so we can't get a total and we can't get a percentage. Can you explain further?

#4 also is a problem, as I don't even understand what "mean of all the percentages" would do for you, it seems this is a meaningless way to look at things. You should never average percentages — it's meaningless — unless the denominator is the same for each; and I don't think that is the case.

--

Paige Miller

Paige Miller

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

For 1&2, the synthax suggested does not give me the kcals consumed PER ID for each Nova group (see synthax #1 in the question). If I put "ways 2", I get the kcals per ID, but not the sum anymore.

For #3, the whole table represents what the different IDs consumed in 1 day, so the fact that it’s “per day” can be ignored for the statistics (sorry about the confusion).

What I want is this:

For example, ID #002 ate 2 foods from Nova 2 class. So I would like to sum up the kcals from that Nova class so from those 2 foods (5 plus 4 for a total of 11 kcals here) divided by the total that person consumed in kcals (which would be 15 calories for ID #002 in this case). So I want to know how much kcals the IDs consumed from each Nova class and divide it by their total consumption (for this example, I would have a percentage of 73% kcals consumption for ID #002 for Nova class 2). Does that make sense?

Thanks for your help!

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Assuming you have written this wrong

(5 plus 4 for a total of 11 kcals here)

this should work:

```
proc means data=test noprint;
class id nova;
types id id*nova;
var kcals;
output out=kcals3 sum=;
run;
data percentages;
merge kcals3(where=(_type_=2) drop=nova rename=(kcals=kcals_total)) kcals3(where=(_type_=3));
by id;
percentage = kcals/kcals_total;
drop _type_ _freq_;
format percentage percent8.2;
run;
```

--

Paige Miller

Paige Miller

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Yes, sorry I mixed up the numbers. That works, thanks so much!

And then you mentioned (for #4) that I shouldn't do an average of percentages. I wanted to do it just to have the average consumption of each Nova group in percentage (for all IDs).

Here is what I had:

proc means data=percentages noprint;

class nova;

var percentage;

output out=kcals4 mean=;

run;

It seems to work, but do you recommend going a different way about it (in terms of thinking and getting that wanted result)? Thanks!

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

@SJ12 wrote:

Yes, sorry I mixed up the numbers. That works, thanks so much!

And then you mentioned (for #4) that I shouldn't do an average of percentages. I wanted to do it just to have the average consumption of each Nova group in percentage (for all IDs).

Here is what I had:

proc means data=percentages noprint;

class nova;

var percentage;

output out=kcals4 mean=;

run;

I said it was meaningless. This number represents no real world quantity; your professor or your boss would likely not accept doing it this way.

It seems to work, but do you recommend going a different way about it (in terms of thinking and getting that wanted result)?

If you want to come up with a meaningful number you would compute

(sum of all the numerators) / (sum of all the denominators)

Since you have already seen how to do sums, I leave it for you as homework to figure out how to get the sum of all the numerators and the sum of all the denominators, and then divide.

--

Paige Miller

Paige Miller

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Ok, that makes sense. Thank you so much for your answers.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

@PaigeMiller , I really wish you could convince toxicologists and especially clinical pathologists about #4. I've spent the last 15 years trying to convince folks that taking the average of several animals' percent change from baseline doesn't result in a number that applies to the population the way they think it does..

SteveDenham.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

I might try something like:

proc tabulate data=test; class id nova foodtype; var kcals; table id all='All IDS', (nova all='All cals per id')* kcals=' '*(sum rowpctsum) / misstext=' ' ; run;

The Rowpctsum is the percentage of the row total of KCALS for each id.

The All IDS would give the overall row percentages. Which would be the percentage of each NOVA class overall.

Means of percentages are pretty much meaningless unless the numerator/denominators of each ID were similar. Since you have ID with no values in some Nova categories that doesn't hold.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Awesome, this can be really helpful. I'll look into it. Thanks for your help!

Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.

**If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website. **

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.