BookmarkSubscribeRSS Feed
SJ12
Calcite | Level 5

Hello,

I want to use the last dataset below called nova to add a total row (so I can obtain total kcals for each nova category (total_nova1 to total_nova4)) or directly use a function to obtain the percentage of kcals per category for all participants (to divide the kcals of each category by the grand total). For example, I want to have the % of kcals for total_nova4 for all participants. I'm having difficulty using proc tabulate as I can't use class nova or var kcals like I did in the first proc tabulate in the synthax since those variables don't exist in my new dataset nova (I don't know how to identify the variables). The numbers in the last dataset (nova) represent kcals (calories). total_nova1 to total_nova4 represent nova 1-4. Please let me know if I am unclear. I am still new to SAS. Thanks

 

Synthax: 

 

data test;
input id foodtype $ Nova $ Subnova $ kcals;
datalines;
001 butter 1 oils 10
001 fish 2 protein 15
001 banana 3 fruit 10
001 cherry 4 fruit 25
002 burger 2 frozen 5
002 pumpkin 2 other 4
002 carrot 3 veg 6
003 apple 1 fruit 100
003 tahini 4   spread   60
004 .   .    .   0
005 burger  2  frozen  13
005 ceral  3 grain 56
005 milk  1  prod 35
005 vinegar 4  Other  100
006 .    .   .   0
006 vinegar 4  Other  100
;
run;
 
proc tabulate data=test out=test1;
   class id nova;
   var kcals;
   table id all='All participants',
        (nova all='Total calories per participant')*
             kcals=' '*(sum='Total calories');
run;
 
proc print data=test1;run;
 
 
 data nova1;
 set test1;
  where nova="1";
    total_nova1= kcals_sum;
    if missing(id) then delete;
 keep id total_nova1;
 run;
 
 data nova2;
 set test1;
  where nova="2";
    total_nova2= kcals_sum;
    if missing(id) then delete;
 keep id total_nova2;
 run;
 
 data nova3;
 set test1;
  where nova="3";
    total_nova3= kcals_sum;
    if missing(id) then delete;
 keep id total_nova3;
 run;
 
 data nova4;
 set test1;
  where nova="4";
    total_nova4= kcals_sum;
    if missing(id) then delete;
 keep id total_nova4;
 run;
 
proc sort data=nova1;by id;run; 
proc sort data=nova2;by id;run; 
proc sort data=nova3;by id;run; 
proc sort data=nova4;by id;run; 
 
data nova;
merge nova1 nova2 nova3 nova4;
by id;
total=sum(total_nova1, total_nova2,total_nova3, total_nova4);
if total=. then delete;
run;
8 REPLIES 8
PaigeMiller
Diamond | Level 26

You don't show us exactly what you want after you create NOVA. It's not clear to me from reading your words what is next. I have simplified your code and calculated percents in data set WANT1, from which you can do PROC TABULATE and make it look any way you want.

 

proc summary data=test;
    class id nova;
    var kcals;
    output out=test1 sum=kcals_sum;
run; 
proc transpose data=test1(where=(_type_ in (1 3))) prefix=nova out=test1a;
    by id;
    id nova;
    var kcals_sum;
run;
data want1;
    merge test1a test1(where=(_type_ in (0 2)));
    by id;
    array nn nova1-nova4;
    array pct pct1-pct4;
    do i=1 to dim(nn);
        pct(i)=nn(i)/kcals_sum;
    end;
    drop _type_ _freq_ _name_ i;
run;

 

--
Paige Miller
SJ12
Calcite | Level 5

The reason why I used the longer synthax is because I wanted to delete participants that have consumed 0 kcals in total (total of 0 per id). I don't want their total to be included in the average when calculating percentages. 

 

As to the result I want, you can see my reply to ballardw. Thanks! 

 

ballardw
Super User

Your example shows making 4 sets from one where you intentionally exclude variables. Is your real data something where you want to add the "Nova" back from a case where each data set is supposed to be from one "nova"? Or are you actually doing all of that to your existing data?

Is the purpose of this to make the output data set from tabulate? The output structure of the data set is still very likely to require post processing as that is just the nature of the output sets created by tabulate.

Perhaps you shouldn't be using tabulate at all but you have not managed to describe what it is that you actually want.

 

Suggestion: Show what you expect the final result of this to look like. Or at least a representation with indications of what goes into the report.

 

 

SJ12
Calcite | Level 5

Hi, I would want something like this (see below), but as a dataset output and not results (only the "all participants" percentage would be actually needed here):

 

proc tabulate data=test;

   class id nova;

   var kcals;

   table id all='All participants',

        (nova all='Total calories per participant')*

             kcals=' '*(sum='Total calories' rowpctsum='Percentage')

      / misstext=' ';

run;

SJ12_0-1667331525590.png

OR something that would just indicate: % of kcals from Nova 4 for all participants as a result, for example.

The reason why I didn’t do the proc tabulate directly is because I needed to delete participants that have consumed 0 kcals in total (total of 0 per id) (I don't want their total to be included in the average), explaining why I used that synthax. 

ballardw
Super User

@SJ12 wrote:

Hi, I would want something like this (see below), but as a dataset output and not results (only the "all participants" percentage would be actually needed here):

 

proc tabulate data=test;

   class id nova;

   var kcals;

   table id all='All participants',

        (nova all='Total calories per participant')*

             kcals=' '*(sum='Total calories' rowpctsum='Percentage')

      / misstext=' ';

run;

SJ12_0-1667331525590.png

OR something that would just indicate: % of kcals from Nova 4 for all participants as a result, for example.

The reason why I didn’t do the proc tabulate directly is because I needed to delete participants that have consumed 0 kcals in total (total of 0 per id) (I don't want their total to be included in the average), explaining why I used that synthax. 


Now, provide the names of every single variable that should be in output. You cannot have repeats of names, so "total calories" , "percentage" ,"1","2","3","4" .

 

And provide an example data set that we can test code with. I believe this has been asked multiple times.

 

IF you have ID with 0 kcals, then set them to missing. Then there is nothing to "sum" and would not get included in the percent calculations.

SJ12
Calcite | Level 5

Hi, the data set was provided in the question. Here is it again: 

 

data test;
input id foodtype $ Nova $ Subnova $ kcals;
datalines;
001 butter 1 oils 10
001 fish 2 protein 15
001 banana 3 fruit 10
001 cherry 4 fruit 25
002 burger 2 frozen 5
002 pumpkin 2 other 4
002 carrot 3 veg 6
003 apple 1 fruit 100
003 tahini 4   spread   60
004 .   .    .   0
005 burger  2  frozen  13
005 ceral  3 grain 56
005 milk  1  prod 35
005 vinegar 4  Other  100
006 .    .   .   0
006 vinegar 4  Other  100
;
run;
 
 
This is an example of what I want (see below). I want to have the % of consumption of "Nova 4'' among all participants. I thought I mentioned this already, but maybe it's not clear since I'm new to SAS. Let me know if that's the case. Thanks 
 
Screen Shot 2022-11-02 at 11.49.36 AM.png
 
 
 
PaigeMiller
Diamond | Level 26

Lots of questions. The reason I ask is because it seems like you are working very hard to do something, and I'm sure there are easier methods.

 

  1. What is wrong with the data set I provided earlier, which I named WANT1? It looks to me like this is EXACTLY what you are asking for, except for item #2 below, and that the columns are in a different order (which is easily fixed).
  2. Surely you know that an Excel file is not a SAS data set, we cannot produce a SAS data set that has variables named 1 or 2 or 3 or 4, SAS variable names cannot begin with a number, and there can't be sub-columns named Total Calories and Percentage; sub-columns don't exist in SAS. So in the SAS data set that you say you want, what should the variables be named?
  3. You say you want this as an output data set, but its rarely a good idea to ask for a wide data set (especially when it looks like you want a report, despite what you said). What would you do with this SAS data set (what analysis, what further computations, what report) will you do with this wide data set?

 

Please do not ignore any of the above. Explain each item above.

--
Paige Miller
SJ12
Calcite | Level 5

Hello,

  1. I never said there was something wrong with the data set WANT1. I just didn’t mind keeping the longer synthax, since I understood well every step of it and because I wanted to make sure that participants with a total of 0 kcals were deleted and weren’t taken in account in the average. Which line in the data set WANT1 takes that into account? 
  2. “there can't be sub-columns named Total Calories and Percentage; sub-columns don't exist in SAS”. Not sure I understand here. I thought they existed since we are able to obtain those columns through proc tabulate? 

This is what I want (have the total % for each nova category): 

 

SJ12_0-1667404023431.png

  1. I want what is above. I just wanted to have the totals first and THEN be able to calculate different percentages from that data set, instead of having them directly in the original data set. I just wanted to have 2 steps for it. My boss asked to do it this way. 

Hope it’s more clear. Thanks.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 8 replies
  • 656 views
  • 0 likes
  • 3 in conversation