Solved: Use of output from proc freq

ANKH1 · Posted 12-05-2023 08:01 PM

Hi,
I ran the following proc and got the data below.

proc freq data=ds1;
table var1;
by time group;
run;

data output;
input time group var1 count percentage;
datalines;
0	1	1	3	42.8
0	1	2	4	57.14
0	2	1	7	58.3
0	2	2	5	41.6
3	1	1	2	28.6
3	1	2	5	71.5
3	2	1	1	50
3	2	2	1	50
6	1	2	3	100
6	2	1	2	100
9	1	2	2	100
9	2	1	2	66.6
9	2	2	2	33.3

;
run;

Var1 is a categorical variable (1=yes, 2=no, 3=unknown). My question is if there is a way to use proc freq or if a data step is what is needed to get the desired output. These are the points to consider:
1) This is data grouped by window and by group. The percentage for each category of var1 is calculated in the above proc freq by: (count of category/sum of counts with categories with answers)*100.
2) However, we need to calculate the percentages by taking into account that the sample is 12. That is for the first row, the percentage should be calculated by (count of category/12)*100= 25%.Second row the percentage should be 33.33%.
3) A third category should be created to account for the unknown in the sample size of 12 (12-(3+4)=5). That means for the first group, the percentages should be 1=25%, 2=33.3%, 3= 41.6%. But since since this step is just one before other data wrangling, new rows have to be created (all rows were var1=3):

data want;
input time group var1 count percentage;
datalines;
0	1	1	3	25
0	1	2	4	33.33333333
0	1	3	5	41.66666667
0	2	1	7	58.33333333
0	2	2	5	41.66666667
0	2	3	0	0
3	1	1	2	16.66666667
3	1	2	5	41.66666667
3	1	3	5	41.66666667
3	2	1	1	8.333333333
3	2	2	1	8.333333333
3	2	3	10	83.33333333
6	1	2	3	25
6	1	3	9	75
6	2	1	2	16.66666667
6	2	3	10	83.33333333
9	1	2	2	16.66666667
9	1	3	10	83.33333333
9	2	1	2	16.66666667
9	2	2	2	16.66666667
9	2	3	8	66.66666667
;
run;

mkeintz · Posted 12-05-2023 10:33 PM

PROC FREQ is not going to create observations that are not already there, which is what you are asking for in creating VAR1=3 frequencies. 
 
You will need a DATA step:
 
data want (drop=_:);
set output;
by time group;
percentage=100*(count/12);
output;

retain _remaining 12;
if first.group then _remaining=12-count;
else _remaining=_remaining-count;

if last.group then do;
var1=3;
count=_remaining;
percentage=100*(count/12);
output;
end;
run;

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

mkeintz · Posted 12-05-2023 10:33 PM

PROC FREQ is not going to create observations that are not already there, which is what you are asking for in creating VAR1=3 frequencies. 
 
You will need a DATA step:
 
data want (drop=_:);
set output;
by time group;
percentage=100*(count/12);
output;

retain _remaining 12;
if first.group then _remaining=12-count;
else _remaining=_remaining-count;

if last.group then do;
var1=3;
count=_remaining;
percentage=100*(count/12);
output;
end;
run;

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

ANKH1 · Posted 12-06-2023 10:00 AM

Thank you so much!

ANKH1 · Posted 12-14-2023 07:27 PM

Hi, I was presented with another scenario of data.

data output;
input time group var1 count percentage;
datalines;
0	1	1	3	42.8
0	1	2	4	57.14
0	2	1	5	45.45
0	2	2	5	45.45
0	2	3	1	9.09
3	1	1	2	28.6
3	1	2	5	71.5
3	2	1	1	50
3	2	2	1	50
6	1	2	3	100
6	2	1	2	100
9	1	2	2	100
9	2	1	2	66.6
9	2	2	2	33.3
;
run;

The difference is that some datasets will have data for var1=3 (see above). How can you add the counts for the same var1=3? Right now if I run the code as it is I get data want as this:

data out;
input time group var1 count percentage;
datalines;
0	1	1	3	25
0	1	2	4	33.33333333
0	1	3	5	41.66666667
0	2	1	5	41.66666667
0	2	2	5	41.66666667
0	2	3	1	8.333333333
0	2	3	1	8.333333333
3	1	1	2	16.66666667
3	1	2	5	41.66666667
3	1	3	5	41.66666667
3	2	1	1	8.333333333
3	2	2	1	8.333333333
3	2	3	10	83.33333333
6	1	2	3	25
6	1	3	9	75
6	2	1	2	16.66666667
6	2	3	10	83.33333333
9	1	2	2	16.66666667
9	1	3	10	83.33333333
9	2	1	2	16.66666667
9	2	2	2	16.66666667
9	2	3	8	66.66666667
;
run;

But what I need is this:

data want;
input time group var1 count percentage;
datalines;
0	1	1	3	25
0	1	2	4	33.33333333
0	1	3	5	41.66666667
0	2	1	5	41.66666667
0	2	2	5	41.66666667
0	2	3	2	16.66666667
3	1	1	2	16.66666667
3	1	2	5	41.66666667
3	1	3	5	41.66666667
3	2	1	1	8.333333333
3	2	2	1	8.333333333
3	2	3	10	83.33333333
6	1	2	3	25
6	1	3	9	75
6	2	1	2	16.66666667
6	2	3	10	83.33333333
9	1	2	2	16.66666667
9	1	3	10	83.33333333
9	2	1	2	16.66666667
9	2	2	2	16.66666667
9	2	3	8	66.66666667
;
run;

Can the code provided be modified? Or how can data want be accomplished?

mkeintz · Posted 12-14-2023 10:27 PM

Assuming the data are sorted by TIME/GROUP/VAR1, then if you already have a VAR1=3 frequency, it will be the observation in which last.group=1.

In such cases you don't want to do the extra OUTPUT statement.

So, in the DO group that checks for last.group=1, just change that IF test from

IF last.group then do:

to a test that only works when last.group=1 and var1 is not already equal to 3.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

ANKH1 · Posted 12-19-2023 02:39 PM

Hi! Sorry for not getting back earlier. I deleted the second output, but the issue is that it only reports the 3's that were originally in the dataset and not the 3's derived from counting the observations from 1 and 2.

Use of output from proc freq

Re: Use of output from proc freq

Re: Use of output from proc freq

Re: Use of output from proc freq

Re: Use of output from proc freq

Re: Use of output from proc freq

Re: Use of output from proc freq

SAS Innovate 2025: Call for Content

Classroom Training Available!