Re: Adding total row or function?

SJ12 · Posted 11-01-2022 02:13 PM

Hello,

I want to use the last dataset below called nova to add a total row (so I can obtain total kcals for each nova category (total_nova1 to total_nova4)) or directly use a function to obtain the percentage of kcals per category for all participants (to divide the kcals of each category by the grand total). For example, I want to have the % of kcals for total_nova4 for all participants. I'm having difficulty using proc tabulate as I can't use class nova or var kcals like I did in the first proc tabulate in the synthax since those variables don't exist in my new dataset nova (I don't know how to identify the variables). The numbers in the last dataset (nova) represent kcals (calories). total_nova1 to total_nova4 represent nova 1-4. Please let me know if I am unclear. I am still new to SAS. Thanks

Synthax:

data test;

input id foodtype $ Nova $ Subnova $ kcals;

datalines;

001 butter 1 oils 10

001 fish 2 protein 15

001 banana 3 fruit 10

001 cherry 4 fruit 25

002 burger 2 frozen 5

002 pumpkin 2 other 4

002 carrot 3 veg 6

003 apple 1 fruit 100

003 tahini 4 spread 60

004 . . . 0

005 burger 2 frozen 13

005 ceral 3 grain 56

005 milk 1 prod 35

005 vinegar 4 Other 100

006 . . . 0

006 vinegar 4 Other 100

;

run;

proc tabulate data=test out=test1;

class id nova;

var kcals;

table id all='All participants',

(nova all='Total calories per participant')*

kcals=' '*(sum='Total calories');

run;

proc print data=test1;run;

data nova1;

set test1;

where nova="1";

total_nova1= kcals_sum;

if missing(id) then delete;

keep id total_nova1;

run;

data nova2;

set test1;

where nova="2";

total_nova2= kcals_sum;

if missing(id) then delete;

keep id total_nova2;

run;

data nova3;

set test1;

where nova="3";

total_nova3= kcals_sum;

if missing(id) then delete;

keep id total_nova3;

run;

data nova4;

set test1;

where nova="4";

total_nova4= kcals_sum;

if missing(id) then delete;

keep id total_nova4;

run;

proc sort data=nova1;by id;run;

proc sort data=nova2;by id;run;

proc sort data=nova3;by id;run;

proc sort data=nova4;by id;run;

data nova;

merge nova1 nova2 nova3 nova4;

by id;

total=sum(total_nova1, total_nova2,total_nova3, total_nova4);

if total=. then delete;

run;

PaigeMiller · Posted 11-01-2022 02:35 PM

You don't show us exactly what you want after you create NOVA. It's not clear to me from reading your words what is next. I have simplified your code and calculated percents in data set WANT1, from which you can do PROC TABULATE and make it look any way you want.

proc summary data=test;
    class id nova;
    var kcals;
    output out=test1 sum=kcals_sum;
run; 
proc transpose data=test1(where=(_type_ in (1 3))) prefix=nova out=test1a;
    by id;
    id nova;
    var kcals_sum;
run;
data want1;
    merge test1a test1(where=(_type_ in (0 2)));
    by id;
    array nn nova1-nova4;
    array pct pct1-pct4;
    do i=1 to dim(nn);
        pct(i)=nn(i)/kcals_sum;
    end;
    drop _type_ _freq_ _name_ i;
run;

--
Paige Miller

SJ12 · Posted 11-01-2022 03:45 PM

The reason why I used the longer synthax is because I wanted to delete participants that have consumed 0 kcals in total (total of 0 per id). I don't want their total to be included in the average when calculating percentages.

As to the result I want, you can see my reply to ballardw. Thanks!

ballardw · Posted 11-01-2022 03:09 PM

Your example shows making 4 sets from one where you intentionally exclude variables. Is your real data something where you want to add the "Nova" back from a case where each data set is supposed to be from one "nova"? Or are you actually doing all of that to your existing data?

Is the purpose of this to make the output data set from tabulate? The output structure of the data set is still very likely to require post processing as that is just the nature of the output sets created by tabulate.

Perhaps you shouldn't be using tabulate at all but you have not managed to describe what it is that you actually want.

Suggestion: Show what you expect the final result of this to look like. Or at least a representation with indications of what goes into the report.

SJ12 · Posted 11-01-2022 03:42 PM

Hi, I would want something like this (see below), but as a dataset output and not results (only the "all participants" percentage would be actually needed here):

proc tabulate data=test;

class id nova;

var kcals;

table id all='All participants',

(nova all='Total calories per participant')*

kcals=' '*(sum='Total calories' rowpctsum='Percentage')

/ misstext=' ';

run;

OR something that would just indicate: % of kcals from Nova 4 for all participants as a result, for example.

The reason why I didn’t do the proc tabulate directly is because I needed to delete participants that have consumed 0 kcals in total (total of 0 per id) (I don't want their total to be included in the average), explaining why I used that synthax.

ballardw · Posted 11-01-2022 06:08 PM

@SJ12 wrote:

Hi, I would want something like this (see below), but as a dataset output and not results (only the "all participants" percentage would be actually needed here):

proc tabulate data=test;

   class id nova;

   var kcals;

   table id all='All participants',

(nova all='Total calories per participant')*

   kcals=' '*(sum='Total calories' rowpctsum='Percentage')

/ misstext=' ';

run;

OR something that would just indicate: % of kcals from Nova 4 for all participants as a result, for example.

The reason why I didn’t do the proc tabulate directly is because I needed to delete participants that have consumed 0 kcals in total (total of 0 per id) (I don't want their total to be included in the average), explaining why I used that synthax.

Now, provide the names of every single variable that should be in output. You cannot have repeats of names, so "total calories" , "percentage" ,"1","2","3","4" .

And provide an example data set that we can test code with. I believe this has been asked multiple times.

IF you have ID with 0 kcals, then set them to missing. Then there is nothing to "sum" and would not get included in the percent calculations.

SJ12 · Posted 11-02-2022 11:51 AM

Hi, the data set was provided in the question. Here is it again:

data test;

input id foodtype $ Nova $ Subnova $ kcals;

datalines;

001 butter 1 oils 10

001 fish 2 protein 15

001 banana 3 fruit 10

001 cherry 4 fruit 25

002 burger 2 frozen 5

002 pumpkin 2 other 4

002 carrot 3 veg 6

003 apple 1 fruit 100

003 tahini 4 spread 60

004 . . . 0

005 burger 2 frozen 13

005 ceral 3 grain 56

005 milk 1 prod 35

005 vinegar 4 Other 100

006 . . . 0

006 vinegar 4 Other 100

;

run;

This is an example of what I want (see below). I want to have the % of consumption of "Nova 4'' among all participants. I thought I mentioned this already, but maybe it's not clear since I'm new to SAS. Let me know if that's the case. Thanks

Screen Shot 2022-11-02 at 11.49.36 AM.png

PaigeMiller · Posted 11-02-2022 06:07 AM

Lots of questions. The reason I ask is because it seems like you are working very hard to do something, and I'm sure there are easier methods.

What is wrong with the data set I provided earlier, which I named WANT1? It looks to me like this is EXACTLY what you are asking for, except for item #2 below, and that the columns are in a different order (which is easily fixed).
Surely you know that an Excel file is not a SAS data set, we cannot produce a SAS data set that has variables named 1 or 2 or 3 or 4, SAS variable names cannot begin with a number, and there can't be sub-columns named Total Calories and Percentage; sub-columns don't exist in SAS. So in the SAS data set that you say you want, what should the variables be named?
You say you want this as an output data set, but its rarely a good idea to ask for a wide data set (especially when it looks like you want a report, despite what you said). What would you do with this SAS data set (what analysis, what further computations, what report) will you do with this wide data set?

Please do not ignore any of the above. Explain each item above.

--
Paige Miller

SJ12 · Posted 11-02-2022 11:47 AM

Hello,

I never said there was something wrong with the data set WANT1. I just didn’t mind keeping the longer synthax, since I understood well every step of it and because I wanted to make sure that participants with a total of 0 kcals were deleted and weren’t taken in account in the average. Which line in the data set WANT1 takes that into account?
“there can't be sub-columns named Total Calories and Percentage; sub-columns don't exist in SAS”. Not sure I understand here. I thought they existed since we are able to obtain those columns through proc tabulate?

This is what I want (have the total % for each nova category):

I want what is above. I just wanted to have the totals first and THEN be able to calculate different percentages from that data set, instead of having them directly in the original data set. I just wanted to have 2 steps for it. My boss asked to do it this way.

Hope it’s more clear. Thanks.

SAS Innovate 2025: Save the Date