BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.

Hi everyone,

I have a dataset where each row has the following variables: Ppt_ID (ID number), child_gender (Boy or Girl), and Data_days (Days of data). See below sample data:

 

Data Days;
Input Ppt_ID child_gender $  Data_days;
datalines;
1	Boy	22.6429
2	Boy	22.4286
3	Girl	15.28571
4	Boy	14.5714
5	Girl	17.21429
6	Boy	18.5714
7	Boy	21.5
8	Girl	14.42857
9	Boy	16.7143
10	Boy	20.2857
11	Boy	15.3571
12	Boy	12.8571
13	Girl	20.21429
14	Girl	20.07143
15	Boy	15.2143
16	Boy	19.0714
17	Girl	15.57143
18	Boy	13.8571
19	Boy	15.9286
20	Boy	17.5
21	Girl	17.57143
22	Boy	14.7143
23	Girl	17
24	Girl	15.14286
25	Girl	16.21429
26	Girl	16.71429
27	Boy	14
28	Girl	17.14286
29	Girl	14.35714
30	Girl	13.35714
31	Boy	14.5714
32	Boy	16.7143
33	Girl	15.64286
34	Girl	16.35714
35	Girl	13.35714
36	Girl	16.28571
37	Boy	16.4286
38	Girl	15.78571
39	Girl	16
40	Boy	15.5714
;
run;

 

I want to output the cumulative frequencies and percents by each child_gender, but in descending order within each group according to the outcome variable, Data_days. The code I have to produce frequencies by child_gender by descending Data_days is below:

 

proc sort data=Days;by child_gender;run;
proc freq data=Days NOPRINT;by child_gender;table Data_days/ out=gender_days;run;
proc sort data=gender_days;by child_gender descending Data_days;run; proc print data=gender_days;run;

 

The code produces the following, where Data_days is in descending order, by child_gender:

child_gender Data_days COUNT PERCENT
Boy 22.6429 1 5
Boy 22.4286 1 5
Boy 21.5 1 5
Boy 20.2857 1 5
Boy 19.0714 1 5
Boy 18.5714 1 5
Boy 17.5 1 5
Boy 16.7143 2 10
Boy 16.4286 1 5
Boy 15.9286 1 5
Boy 15.5714 1 5
Boy 15.3571 1 5
Boy 15.2143 1 5
Boy 14.7143 1 5
Boy 14.5714 2 10
Boy 14 1 5
Boy 13.8571 1 5
Boy 12.8571 1 5
Girl 20.2143 1 5
Girl 20.0714 1 5
Girl 17.5714 1 5
Girl 17.2143 1 5
Girl 17.1429 1 5
Girl 17 1 5
Girl 16.7143 1 5
Girl 16.3571 1 5
Girl 16.2857 1 5
Girl 16.2143 1 5
Girl 16 1 5
Girl 15.7857 1 5
Girl 15.6429 1 5
Girl 15.5714 1 5
Girl 15.2857 1 5
Girl 15.1429 1 5
Girl 14.4286 1 5
Girl 14.3571 1 5
Girl 13.3571 2 10

 

What I want to output is the following, where the cumulative count and percent are calculated beginning at the highest Data_days value, again by child_gender - that is, each cell is summing up the Count (or Percent) across all the previous rows and the current row:

child_gender Data_days COUNT PERCENT Cumulative_count Cumulative_percent
Boy 22.6429 1 5 1 5
Boy 22.4286 1 5 2 10
Boy 21.5 1 5 3 15
Boy 20.2857 1 5 4 20
Boy 19.0714 1 5 5 25
Boy 18.5714 1 5 6 30
Boy 17.5 1 5 7 35
Boy 16.7143 2 10 9 45
Boy 16.4286 1 5 10 50
Boy 15.9286 1 5 11 55
Boy 15.5714 1 5 12 60
Boy 15.3571 1 5 13 65
Boy 15.2143 1 5 14 70
Boy 14.7143 1 5 15 75
Boy 14.5714 2 10 17 85
Boy 14 1 5 18 90
Boy 13.8571 1 5 19 95
Boy 12.8571 1 5 20 100
Girl 20.2143 1 5 1 5
Girl 20.0714 1 5 2 10
Girl 17.5714 1 5 3 15
Girl 17.2143 1 5 4 20
Girl 17.1429 1 5 5 25
Girl 17 1 5 6 30
Girl 16.7143 1 5 7 35
Girl 16.3571 1 5 8 40
Girl 16.2857 1 5 9 45
Girl 16.2143 1 5 10 50
Girl 16 1 5 11 55
Girl 15.7857 1 5 12 60
Girl 15.6429 1 5 13 65
Girl 15.5714 1 5 14 70
Girl 15.2857 1 5 15 75
Girl 15.1429 1 5 16 80
Girl 14.4286 1 5 17 85
Girl 14.3571 1 5 18 90
Girl 13.3571 2 10 20 100

 

Could someone help me with this? Thank you.

1 ACCEPTED SOLUTION

Accepted Solutions
Rick_SAS
SAS Super FREQ

If I understand the question, I think you want to add a DATA step that uses BY group processing on the gender variable. Use the FIRST.child_gender indicator variable to set the cumulative totals to 0 at the start of each by group, then increment the cumulative variable for each observation. You might want to read about BY-group processing and the FIRST.variable indicator variable in the article, "How to use FIRST.variable and LAST.variable in a BY-group analysis in SAS."

 

Here's one solution:

 

data Want;
set gender_days;
by child_gender;
if first.child_gender then do;
   Cumulative_count = 0;
   Cumulative_percent = 0;
end;
Cumulative_count + count;
Cumulative_percent + percent;
run;

proc print; run;

View solution in original post

3 REPLIES 3
PaigeMiller
Diamond | Level 26

There are a couple of things that don't feel right here. PROC FREQ is not normally a tool I would choose for continuous variables such as your variable DATA_DAYS; in fact I think it is the wrong tool. Another things that doesn't feel right is that your cumulative percents are always multiples of 5. Now that could happen if your data is really 20 girls and 20 boys, but is your real data like that, 20 girls and 20 boys, and never some other numbers?

 

In any event, this seems like a job for PROC RANK in which you can rank the numbers in DATA_DAYS from 1 to n, from which it is easy to calculate the cumulative percents.

--
Paige Miller
ballardw
Super User

If you think that is really useful:

proc sort data=days;
   by child_gender data_days;
run;

proc freq data=days noprint;
   by child_gender;
   tables data_days/ outcum out=want;
run;

Makes a data set as requested so some of the variable names are different.

Rick_SAS
SAS Super FREQ

If I understand the question, I think you want to add a DATA step that uses BY group processing on the gender variable. Use the FIRST.child_gender indicator variable to set the cumulative totals to 0 at the start of each by group, then increment the cumulative variable for each observation. You might want to read about BY-group processing and the FIRST.variable indicator variable in the article, "How to use FIRST.variable and LAST.variable in a BY-group analysis in SAS."

 

Here's one solution:

 

data Want;
set gender_days;
by child_gender;
if first.child_gender then do;
   Cumulative_count = 0;
   Cumulative_percent = 0;
end;
Cumulative_count + count;
Cumulative_percent + percent;
run;

proc print; run;

sas-innovate-wordmark-2025-midnight.png

Register Today!

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.


Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 931 views
  • 2 likes
  • 4 in conversation