Solved: PROC FREQ cumulative descending percent by group

confooseddesi89 · Posted 10-12-2023 04:51 PM

Hi everyone,

I have a dataset where each row has the following variables: Ppt_ID (ID number), child_gender (Boy or Girl), and Data_days (Days of data). See below sample data:

Data Days;
Input Ppt_ID child_gender $  Data_days;
datalines;
1	Boy	22.6429
2	Boy	22.4286
3	Girl	15.28571
4	Boy	14.5714
5	Girl	17.21429
6	Boy	18.5714
7	Boy	21.5
8	Girl	14.42857
9	Boy	16.7143
10	Boy	20.2857
11	Boy	15.3571
12	Boy	12.8571
13	Girl	20.21429
14	Girl	20.07143
15	Boy	15.2143
16	Boy	19.0714
17	Girl	15.57143
18	Boy	13.8571
19	Boy	15.9286
20	Boy	17.5
21	Girl	17.57143
22	Boy	14.7143
23	Girl	17
24	Girl	15.14286
25	Girl	16.21429
26	Girl	16.71429
27	Boy	14
28	Girl	17.14286
29	Girl	14.35714
30	Girl	13.35714
31	Boy	14.5714
32	Boy	16.7143
33	Girl	15.64286
34	Girl	16.35714
35	Girl	13.35714
36	Girl	16.28571
37	Boy	16.4286
38	Girl	15.78571
39	Girl	16
40	Boy	15.5714
;
run;

I want to output the cumulative frequencies and percents by each child_gender, but in descending order within each group according to the outcome variable, Data_days. The code I have to produce frequencies by child_gender by descending Data_days is below:

proc sort data=Days;by child_gender;run;
proc freq data=Days NOPRINT;by child_gender;table Data_days/ out=gender_days;run;
proc sort data=gender_days;by child_gender descending Data_days;run;
proc print data=gender_days;run;

The code produces the following, where Data_days is in descending order, by child_gender:

child_gender	Data_days	COUNT	PERCENT
Boy	22.6429	1	5
Boy	22.4286	1	5
Boy	21.5	1	5
Boy	20.2857	1	5
Boy	19.0714	1	5
Boy	18.5714	1	5
Boy	17.5	1	5
Boy	16.7143	2	10
Boy	16.4286	1	5
Boy	15.9286	1	5
Boy	15.5714	1	5
Boy	15.3571	1	5
Boy	15.2143	1	5
Boy	14.7143	1	5
Boy	14.5714	2	10
Boy	14	1	5
Boy	13.8571	1	5
Boy	12.8571	1	5
Girl	20.2143	1	5
Girl	20.0714	1	5
Girl	17.5714	1	5
Girl	17.2143	1	5
Girl	17.1429	1	5
Girl	17	1	5
Girl	16.7143	1	5
Girl	16.3571	1	5
Girl	16.2857	1	5
Girl	16.2143	1	5
Girl	16	1	5
Girl	15.7857	1	5
Girl	15.6429	1	5
Girl	15.5714	1	5
Girl	15.2857	1	5
Girl	15.1429	1	5
Girl	14.4286	1	5
Girl	14.3571	1	5
Girl	13.3571	2	10

What I want to output is the following, where the cumulative count and percent are calculated beginning at the highest Data_days value, again by child_gender - that is, each cell is summing up the Count (or Percent) across all the previous rows and the current row:

child_gender	Data_days	COUNT	PERCENT	Cumulative_count	Cumulative_percent
Boy	22.6429	1	5	1	5
Boy	22.4286	1	5	2	10
Boy	21.5	1	5	3	15
Boy	20.2857	1	5	4	20
Boy	19.0714	1	5	5	25
Boy	18.5714	1	5	6	30
Boy	17.5	1	5	7	35
Boy	16.7143	2	10	9	45
Boy	16.4286	1	5	10	50
Boy	15.9286	1	5	11	55
Boy	15.5714	1	5	12	60
Boy	15.3571	1	5	13	65
Boy	15.2143	1	5	14	70
Boy	14.7143	1	5	15	75
Boy	14.5714	2	10	17	85
Boy	14	1	5	18	90
Boy	13.8571	1	5	19	95
Boy	12.8571	1	5	20	100
Girl	20.2143	1	5	1	5
Girl	20.0714	1	5	2	10
Girl	17.5714	1	5	3	15
Girl	17.2143	1	5	4	20
Girl	17.1429	1	5	5	25
Girl	17	1	5	6	30
Girl	16.7143	1	5	7	35
Girl	16.3571	1	5	8	40
Girl	16.2857	1	5	9	45
Girl	16.2143	1	5	10	50
Girl	16	1	5	11	55
Girl	15.7857	1	5	12	60
Girl	15.6429	1	5	13	65
Girl	15.5714	1	5	14	70
Girl	15.2857	1	5	15	75
Girl	15.1429	1	5	16	80
Girl	14.4286	1	5	17	85
Girl	14.3571	1	5	18	90
Girl	13.3571	2	10	20	100

Could someone help me with this? Thank you.

Rick_SAS · Posted 10-13-2023 07:09 AM

If I understand the question, I think you want to add a DATA step that uses BY group processing on the gender variable. Use the FIRST.child_gender indicator variable to set the cumulative totals to 0 at the start of each by group, then increment the cumulative variable for each observation. You might want to read about BY-group processing and the FIRST.variable indicator variable in the article, "How to use FIRST.variable and LAST.variable in a BY-group analysis in SAS."

Here's one solution:

data Want;
set gender_days;
by child_gender;
if first.child_gender then do;
   Cumulative_count = 0;
   Cumulative_percent = 0;
end;
Cumulative_count + count;
Cumulative_percent + percent;
run;

proc print; run;

View solution in original post

PaigeMiller · Posted 10-12-2023 05:45 PM

There are a couple of things that don't feel right here. PROC FREQ is not normally a tool I would choose for continuous variables such as your variable DATA_DAYS; in fact I think it is the wrong tool. Another things that doesn't feel right is that your cumulative percents are always multiples of 5. Now that could happen if your data is really 20 girls and 20 boys, but is your real data like that, 20 girls and 20 boys, and never some other numbers?

In any event, this seems like a job for PROC RANK in which you can rank the numbers in DATA_DAYS from 1 to n, from which it is easy to calculate the cumulative percents.

--
Paige Miller

ballardw · Posted 10-12-2023 05:55 PM

If you think that is really useful:

proc sort data=days;
   by child_gender data_days;
run;

proc freq data=days noprint;
   by child_gender;
   tables data_days/ outcum out=want;
run;

Makes a data set as requested so some of the variable names are different.

Rick_SAS · Posted 10-13-2023 07:09 AM

If I understand the question, I think you want to add a DATA step that uses BY group processing on the gender variable. Use the FIRST.child_gender indicator variable to set the cumulative totals to 0 at the start of each by group, then increment the cumulative variable for each observation. You might want to read about BY-group processing and the FIRST.variable indicator variable in the article, "How to use FIRST.variable and LAST.variable in a BY-group analysis in SAS."

Here's one solution:

data Want;
set gender_days;
by child_gender;
if first.child_gender then do;
   Cumulative_count = 0;
   Cumulative_percent = 0;
end;
Cumulative_count + count;
Cumulative_percent + percent;
run;

proc print; run;

PROC FREQ cumulative descending percent by group

Re: PROC FREQ cumulative descending percent by group

Re: PROC FREQ cumulative descending percent by group

Re: PROC FREQ cumulative descending percent by group

Re: PROC FREQ cumulative descending percent by group

PROC FREQ cumulative descending percent by group

Re: PROC FREQ cumulative descending percent by group

Re: PROC FREQ cumulative descending percent by group

Re: PROC FREQ cumulative descending percent by group

Re: PROC FREQ cumulative descending percent by group

Registration is open

SAS Training: Just a Click Away