Hello,
I want to use the last dataset below called nova to add a total row (so I can obtain total kcals for each nova category (total_nova1 to total_nova4)) or directly use a function to obtain the percentage of kcals per category for all participants (to divide the kcals of each category by the grand total). For example, I want to have the % of kcals for total_nova4 for all participants. I'm having difficulty using proc tabulate as I can't use class nova or var kcals like I did in the first proc tabulate in the synthax since those variables don't exist in my new dataset nova (I don't know how to identify the variables). The numbers in the last dataset (nova) represent kcals (calories). total_nova1 to total_nova4 represent nova 1-4. Please let me know if I am unclear. I am still new to SAS. Thanks
Synthax:
You don't show us exactly what you want after you create NOVA. It's not clear to me from reading your words what is next. I have simplified your code and calculated percents in data set WANT1, from which you can do PROC TABULATE and make it look any way you want.
proc summary data=test;
    class id nova;
    var kcals;
    output out=test1 sum=kcals_sum;
run; 
proc transpose data=test1(where=(_type_ in (1 3))) prefix=nova out=test1a;
    by id;
    id nova;
    var kcals_sum;
run;
data want1;
    merge test1a test1(where=(_type_ in (0 2)));
    by id;
    array nn nova1-nova4;
    array pct pct1-pct4;
    do i=1 to dim(nn);
        pct(i)=nn(i)/kcals_sum;
    end;
    drop _type_ _freq_ _name_ i;
run;
The reason why I used the longer synthax is because I wanted to delete participants that have consumed 0 kcals in total (total of 0 per id). I don't want their total to be included in the average when calculating percentages.
As to the result I want, you can see my reply to ballardw. Thanks!
Your example shows making 4 sets from one where you intentionally exclude variables. Is your real data something where you want to add the "Nova" back from a case where each data set is supposed to be from one "nova"? Or are you actually doing all of that to your existing data?
Is the purpose of this to make the output data set from tabulate? The output structure of the data set is still very likely to require post processing as that is just the nature of the output sets created by tabulate.
Perhaps you shouldn't be using tabulate at all but you have not managed to describe what it is that you actually want.
Suggestion: Show what you expect the final result of this to look like. Or at least a representation with indications of what goes into the report.
Hi, I would want something like this (see below), but as a dataset output and not results (only the "all participants" percentage would be actually needed here):
proc tabulate data=test;
class id nova;
var kcals;
table id all='All participants',
(nova all='Total calories per participant')*
kcals=' '*(sum='Total calories' rowpctsum='Percentage')
/ misstext=' ';
run;
OR something that would just indicate: % of kcals from Nova 4 for all participants as a result, for example.
The reason why I didn’t do the proc tabulate directly is because I needed to delete participants that have consumed 0 kcals in total (total of 0 per id) (I don't want their total to be included in the average), explaining why I used that synthax.
@SJ12 wrote:
Hi, I would want something like this (see below), but as a dataset output and not results (only the "all participants" percentage would be actually needed here):
proc tabulate data=test;
class id nova;
var kcals;
table id all='All participants',
(nova all='Total calories per participant')*
kcals=' '*(sum='Total calories' rowpctsum='Percentage')
/ misstext=' ';
run;
OR something that would just indicate: % of kcals from Nova 4 for all participants as a result, for example.
The reason why I didn’t do the proc tabulate directly is because I needed to delete participants that have consumed 0 kcals in total (total of 0 per id) (I don't want their total to be included in the average), explaining why I used that synthax.
Now, provide the names of every single variable that should be in output. You cannot have repeats of names, so "total calories" , "percentage" ,"1","2","3","4" .
And provide an example data set that we can test code with. I believe this has been asked multiple times.
IF you have ID with 0 kcals, then set them to missing. Then there is nothing to "sum" and would not get included in the percent calculations.
Hi, the data set was provided in the question. Here is it again:
Lots of questions. The reason I ask is because it seems like you are working very hard to do something, and I'm sure there are easier methods.
Please do not ignore any of the above. Explain each item above.
Hello,
This is what I want (have the total % for each nova category):
Hope it’s more clear. Thanks.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.
