Hi everyone,
I have a dataset where each row has the following variables: Ppt_ID
(ID number), child_gender
(Boy or Girl), and Data_days
(Days of data). See below sample data:
Data Days;
Input Ppt_ID child_gender $ Data_days;
datalines;
1 Boy 22.6429
2 Boy 22.4286
3 Girl 15.28571
4 Boy 14.5714
5 Girl 17.21429
6 Boy 18.5714
7 Boy 21.5
8 Girl 14.42857
9 Boy 16.7143
10 Boy 20.2857
11 Boy 15.3571
12 Boy 12.8571
13 Girl 20.21429
14 Girl 20.07143
15 Boy 15.2143
16 Boy 19.0714
17 Girl 15.57143
18 Boy 13.8571
19 Boy 15.9286
20 Boy 17.5
21 Girl 17.57143
22 Boy 14.7143
23 Girl 17
24 Girl 15.14286
25 Girl 16.21429
26 Girl 16.71429
27 Boy 14
28 Girl 17.14286
29 Girl 14.35714
30 Girl 13.35714
31 Boy 14.5714
32 Boy 16.7143
33 Girl 15.64286
34 Girl 16.35714
35 Girl 13.35714
36 Girl 16.28571
37 Boy 16.4286
38 Girl 15.78571
39 Girl 16
40 Boy 15.5714
;
run;
I want to output the cumulative frequencies and percents by each child_gender
, but in descending order within each group according to the outcome variable, Data_days
. The code I have to produce frequencies by child_gender
by descending Data_days
is below:
proc sort data=Days;by child_gender;run;
proc freq data=Days NOPRINT;by child_gender;table Data_days/ out=gender_days;run;
proc sort data=gender_days;by child_gender descending Data_days;run;
proc print data=gender_days;run;
The code produces the following, where Data_days
is in descending order, by child_gender
:
child_gender | Data_days | COUNT | PERCENT |
Boy | 22.6429 | 1 | 5 |
Boy | 22.4286 | 1 | 5 |
Boy | 21.5 | 1 | 5 |
Boy | 20.2857 | 1 | 5 |
Boy | 19.0714 | 1 | 5 |
Boy | 18.5714 | 1 | 5 |
Boy | 17.5 | 1 | 5 |
Boy | 16.7143 | 2 | 10 |
Boy | 16.4286 | 1 | 5 |
Boy | 15.9286 | 1 | 5 |
Boy | 15.5714 | 1 | 5 |
Boy | 15.3571 | 1 | 5 |
Boy | 15.2143 | 1 | 5 |
Boy | 14.7143 | 1 | 5 |
Boy | 14.5714 | 2 | 10 |
Boy | 14 | 1 | 5 |
Boy | 13.8571 | 1 | 5 |
Boy | 12.8571 | 1 | 5 |
Girl | 20.2143 | 1 | 5 |
Girl | 20.0714 | 1 | 5 |
Girl | 17.5714 | 1 | 5 |
Girl | 17.2143 | 1 | 5 |
Girl | 17.1429 | 1 | 5 |
Girl | 17 | 1 | 5 |
Girl | 16.7143 | 1 | 5 |
Girl | 16.3571 | 1 | 5 |
Girl | 16.2857 | 1 | 5 |
Girl | 16.2143 | 1 | 5 |
Girl | 16 | 1 | 5 |
Girl | 15.7857 | 1 | 5 |
Girl | 15.6429 | 1 | 5 |
Girl | 15.5714 | 1 | 5 |
Girl | 15.2857 | 1 | 5 |
Girl | 15.1429 | 1 | 5 |
Girl | 14.4286 | 1 | 5 |
Girl | 14.3571 | 1 | 5 |
Girl | 13.3571 | 2 | 10 |
What I want to output is the following, where the cumulative count and percent are calculated beginning at the highest Data_days
value, again by child_gender
- that is, each cell is summing up the Count (or Percent) across all the previous rows and the current row:
child_gender | Data_days | COUNT | PERCENT | Cumulative_count | Cumulative_percent |
Boy | 22.6429 | 1 | 5 | 1 | 5 |
Boy | 22.4286 | 1 | 5 | 2 | 10 |
Boy | 21.5 | 1 | 5 | 3 | 15 |
Boy | 20.2857 | 1 | 5 | 4 | 20 |
Boy | 19.0714 | 1 | 5 | 5 | 25 |
Boy | 18.5714 | 1 | 5 | 6 | 30 |
Boy | 17.5 | 1 | 5 | 7 | 35 |
Boy | 16.7143 | 2 | 10 | 9 | 45 |
Boy | 16.4286 | 1 | 5 | 10 | 50 |
Boy | 15.9286 | 1 | 5 | 11 | 55 |
Boy | 15.5714 | 1 | 5 | 12 | 60 |
Boy | 15.3571 | 1 | 5 | 13 | 65 |
Boy | 15.2143 | 1 | 5 | 14 | 70 |
Boy | 14.7143 | 1 | 5 | 15 | 75 |
Boy | 14.5714 | 2 | 10 | 17 | 85 |
Boy | 14 | 1 | 5 | 18 | 90 |
Boy | 13.8571 | 1 | 5 | 19 | 95 |
Boy | 12.8571 | 1 | 5 | 20 | 100 |
Girl | 20.2143 | 1 | 5 | 1 | 5 |
Girl | 20.0714 | 1 | 5 | 2 | 10 |
Girl | 17.5714 | 1 | 5 | 3 | 15 |
Girl | 17.2143 | 1 | 5 | 4 | 20 |
Girl | 17.1429 | 1 | 5 | 5 | 25 |
Girl | 17 | 1 | 5 | 6 | 30 |
Girl | 16.7143 | 1 | 5 | 7 | 35 |
Girl | 16.3571 | 1 | 5 | 8 | 40 |
Girl | 16.2857 | 1 | 5 | 9 | 45 |
Girl | 16.2143 | 1 | 5 | 10 | 50 |
Girl | 16 | 1 | 5 | 11 | 55 |
Girl | 15.7857 | 1 | 5 | 12 | 60 |
Girl | 15.6429 | 1 | 5 | 13 | 65 |
Girl | 15.5714 | 1 | 5 | 14 | 70 |
Girl | 15.2857 | 1 | 5 | 15 | 75 |
Girl | 15.1429 | 1 | 5 | 16 | 80 |
Girl | 14.4286 | 1 | 5 | 17 | 85 |
Girl | 14.3571 | 1 | 5 | 18 | 90 |
Girl | 13.3571 | 2 | 10 | 20 | 100 |
Could someone help me with this? Thank you.
If I understand the question, I think you want to add a DATA step that uses BY group processing on the gender variable. Use the FIRST.child_gender indicator variable to set the cumulative totals to 0 at the start of each by group, then increment the cumulative variable for each observation. You might want to read about BY-group processing and the FIRST.variable indicator variable in the article, "How to use FIRST.variable and LAST.variable in a BY-group analysis in SAS."
Here's one solution:
data Want;
set gender_days;
by child_gender;
if first.child_gender then do;
Cumulative_count = 0;
Cumulative_percent = 0;
end;
Cumulative_count + count;
Cumulative_percent + percent;
run;
proc print; run;
There are a couple of things that don't feel right here. PROC FREQ is not normally a tool I would choose for continuous variables such as your variable DATA_DAYS; in fact I think it is the wrong tool. Another things that doesn't feel right is that your cumulative percents are always multiples of 5. Now that could happen if your data is really 20 girls and 20 boys, but is your real data like that, 20 girls and 20 boys, and never some other numbers?
In any event, this seems like a job for PROC RANK in which you can rank the numbers in DATA_DAYS from 1 to n, from which it is easy to calculate the cumulative percents.
If you think that is really useful:
proc sort data=days; by child_gender data_days; run; proc freq data=days noprint; by child_gender; tables data_days/ outcum out=want; run;
Makes a data set as requested so some of the variable names are different.
If I understand the question, I think you want to add a DATA step that uses BY group processing on the gender variable. Use the FIRST.child_gender indicator variable to set the cumulative totals to 0 at the start of each by group, then increment the cumulative variable for each observation. You might want to read about BY-group processing and the FIRST.variable indicator variable in the article, "How to use FIRST.variable and LAST.variable in a BY-group analysis in SAS."
Here's one solution:
data Want;
set gender_days;
by child_gender;
if first.child_gender then do;
Cumulative_count = 0;
Cumulative_percent = 0;
end;
Cumulative_count + count;
Cumulative_percent + percent;
run;
proc print; run;
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.