BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
ANKH1
Pyrite | Level 9

Hi,
I ran the following proc and got the data below.

proc freq data=ds1;
table var1;
by time group;
run;

data output;
input time group var1 count percentage;
datalines;
0	1	1	3	42.8
0	1	2	4	57.14
0	2	1	7	58.3
0	2	2	5	41.6
3	1	1	2	28.6
3	1	2	5	71.5
3	2	1	1	50
3	2	2	1	50
6	1	2	3	100
6	2	1	2	100
9	1	2	2	100
9	2	1	2	66.6
9	2	2	2	33.3

;
run;

Var1 is a categorical variable (1=yes, 2=no, 3=unknown). My question is if there is a way to use proc freq or if a data step is what is needed to get the desired output. These are the points to consider:
1) This is data grouped by window and by group. The percentage for each category of var1 is calculated in the above proc freq by: (count of category/sum of counts with categories with answers)*100.
2) However, we need to calculate the percentages by taking into account that the sample is 12. That is for the first row, the percentage should be calculated by (count of category/12)*100= 25%.Second row the percentage should be 33.33%.
3) A third category should be created to account for the unknown in the sample size of 12 (12-(3+4)=5). That means for the first group, the percentages should be 1=25%, 2=33.3%, 3= 41.6%. But since since this step is just one before other data wrangling, new rows have to be created (all rows were var1=3):

data want;
input time group var1 count percentage;
datalines;
0	1	1	3	25
0	1	2	4	33.33333333
0	1	3	5	41.66666667
0	2	1	7	58.33333333
0	2	2	5	41.66666667
0	2	3	0	0
3	1	1	2	16.66666667
3	1	2	5	41.66666667
3	1	3	5	41.66666667
3	2	1	1	8.333333333
3	2	2	1	8.333333333
3	2	3	10	83.33333333
6	1	2	3	25
6	1	3	9	75
6	2	1	2	16.66666667
6	2	3	10	83.33333333
9	1	2	2	16.66666667
9	1	3	10	83.33333333
9	2	1	2	16.66666667
9	2	2	2	16.66666667
9	2	3	8	66.66666667
;
run;
1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star
PROC FREQ is not going to create observations that are not already there, which is what you are asking for in creating VAR1=3 frequencies. 
 
You will need a DATA step:
 
data want (drop=_:);
set output;
by time group;
percentage=100*(count/12);
output;

retain _remaining 12;
if first.group then _remaining=12-count;
else _remaining=_remaining-count;

if last.group then do;
var1=3;
count=_remaining;
percentage=100*(count/12);
output;
end;
run;
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

5 REPLIES 5
mkeintz
PROC Star
PROC FREQ is not going to create observations that are not already there, which is what you are asking for in creating VAR1=3 frequencies. 
 
You will need a DATA step:
 
data want (drop=_:);
set output;
by time group;
percentage=100*(count/12);
output;

retain _remaining 12;
if first.group then _remaining=12-count;
else _remaining=_remaining-count;

if last.group then do;
var1=3;
count=_remaining;
percentage=100*(count/12);
output;
end;
run;
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
ANKH1
Pyrite | Level 9

Thank you so much!

ANKH1
Pyrite | Level 9

Hi, I was presented with another scenario of data.

data output;
input time group var1 count percentage;
datalines;
0	1	1	3	42.8
0	1	2	4	57.14
0	2	1	5	45.45
0	2	2	5	45.45
0	2	3	1	9.09
3	1	1	2	28.6
3	1	2	5	71.5
3	2	1	1	50
3	2	2	1	50
6	1	2	3	100
6	2	1	2	100
9	1	2	2	100
9	2	1	2	66.6
9	2	2	2	33.3
;
run;

The difference is that some datasets will have data for var1=3 (see above). How can you add the counts for the same var1=3? Right now if I run the code as it is I get data want as this:

data out;
input time group var1 count percentage;
datalines;
0	1	1	3	25
0	1	2	4	33.33333333
0	1	3	5	41.66666667
0	2	1	5	41.66666667
0	2	2	5	41.66666667
0	2	3	1	8.333333333
0	2	3	1	8.333333333
3	1	1	2	16.66666667
3	1	2	5	41.66666667
3	1	3	5	41.66666667
3	2	1	1	8.333333333
3	2	2	1	8.333333333
3	2	3	10	83.33333333
6	1	2	3	25
6	1	3	9	75
6	2	1	2	16.66666667
6	2	3	10	83.33333333
9	1	2	2	16.66666667
9	1	3	10	83.33333333
9	2	1	2	16.66666667
9	2	2	2	16.66666667
9	2	3	8	66.66666667
;
run;

But what I need is this:

data want;
input time group var1 count percentage;
datalines;
0	1	1	3	25
0	1	2	4	33.33333333
0	1	3	5	41.66666667
0	2	1	5	41.66666667
0	2	2	5	41.66666667
0	2	3	2	16.66666667
3	1	1	2	16.66666667
3	1	2	5	41.66666667
3	1	3	5	41.66666667
3	2	1	1	8.333333333
3	2	2	1	8.333333333
3	2	3	10	83.33333333
6	1	2	3	25
6	1	3	9	75
6	2	1	2	16.66666667
6	2	3	10	83.33333333
9	1	2	2	16.66666667
9	1	3	10	83.33333333
9	2	1	2	16.66666667
9	2	2	2	16.66666667
9	2	3	8	66.66666667
;
run;

Can the code provided be modified? Or how can data want be accomplished? 

mkeintz
PROC Star

Assuming the data are sorted by TIME/GROUP/VAR1, then if you already have a VAR1=3 frequency, it will be the observation in which last.group=1. 

 

In such cases you don't want to do the extra OUTPUT statement.

 

So, in the DO group that checks for last.group=1, just change that IF test from

   

IF last.group then do:

to a test that only works when last.group=1 and var1 is not already equal to 3.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
ANKH1
Pyrite | Level 9

Hi! Sorry for not getting back earlier. I deleted the second output, but the issue is that it only reports the 3's that were originally in the dataset and not the 3's derived from counting the observations from 1 and 2. 

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 2341 views
  • 0 likes
  • 2 in conversation