Hi there,
I was wondering if someone could help me with some coding.
I have a dataset that I need help grouping. The highlighted column (BR_grp) is the required column. This is created by looking at the cumulative sum for "size_total", where a new group is created once the cumulative sum has reached 2.5%.
I have also named the groups in "BR_grp" according to the corresponding customer age (it is preferable to have it like this however I am open to other solutions). I have tried creating a counter with multiple arrays etc and nothing seems to work.
I did create the variable "cum_size1" which keeps "size_total" variable if the cumulative size is less than 2.5% and 0 for anything greater... maybe that might help?
I've attached an excel sheet below.
Thank you 🙂
data have;
infile cards expandtabs;
input CUST_AGE Good Bad size_total :percent8.;
format size_total percent8.2;
cards;
. 859 151 4.00%
18 4 3 0.03%
19 14 7 0.08%
20 52 16 0.27%
21 107 13 0.47%
22 152 28 0.71%
23 261 51 1.23%
24 390 54 1.76%
25 379 85 1.84%
26 511 80 2.34%
27 748 108 3.39%
28 791 142 3.69%
29 828 146 3.85%
30 969 180 4.55%
31 1112 158 5.02%
32 1104 177 5.07%
33 1110 169 5.06%
34 1064 189 4.96%
35 1163 215 5.45%
36 1144 164 5.17%
37 970 157 4.46%
38 826 122 3.75%
39 716 146 3.41%
40 694 127 3.25%
41 620 90 2.81%
42 599 64 2.62%
43 542 81 2.46%
44 426 70 1.96%
45 415 75 1.94%
46 343 73 1.65%
47 353 51 1.60%
48 371 42 1.63%
49 351 32 1.52%
50 244 27 1.07%
51 208 34 0.96%
52 200 31 0.91%
53 184 20 0.81%
54 176 10 0.74%
55 141 14 0.61%
56 137 17 0.61%
57 121 13 0.53%
58 78 11 0.35%
59 60 6 0.26%
60 49 20 0.27%
61 24 3 0.11%
62 15 8 0.09%
63 28 3 0.12%
64 13 0 0.05%
65 14 0 0.06%
66 28 1 0.11%
67 12 0 0.05%
68 11 0 0.04%
69 16 5 0.08%
70 5 1 0.02%
71 10 0 0.04%
72 12 3 0.06%
73 1 0 0.00%
74 2 0 0.01%
75 2 0 0.01%
76 . 1 .
77 3 0 0.01%
79 3 0 0.01%
80 1 1 0.01%
;
data temp;
set have;
if missing(CUST_AGE) then delete;
if cum>0.025 then do;group+1;cum=0;end;
cum+size_total;
run;
proc sql;
select *,min(CUST_AGE) as min,max(CUST_AGE) as max
from temp
group by group
order by 1;
quit;
Many of us will not (or cannot) download Excel files, as they are a security threat. The proper way to provide your example data is via these instructions (or by creating the equivalent SAS data step code by typing it in yourself). We also can't use screen captures of data to write programs.
First, I would just create consecutive group numbers rather than trying to create 18_23 in one step. It seems you should be able to do that from your cum_size1 variable. Then ... use PROC SUMMARY to find the minimum and max age in each group, and from there you can create group names like 18_23.
Please explain why 43 is in a group by itself and not 43_44.
From what I see, 43 satisfies the > 2.5% rule on its own, but 44 does not. So from 44 on, it should be
44-45
46-47
48-49
50-52
53-??
I'm not following your logic
43 is 2.46% so it is not >2.5%
Oh, you're right. Damn those small digits in the picture, I read 2.64. (maybe I also get old-age dyslexia)
Since here is the SAS Communities and not the Excel support forum, you should supply data in SAS style, which means a data step with datalines, posted into a code box. All this is described in your welcome email (if the communities moderators told us the truth).
Since your group name is dependent on the first and last member of a group, you will probably need to a double DO loop, in the first you determine the start and end of a group, in the second you reread all these values and set the new variable, and OUTPUT.
Or you read through your dataset and create a CNTLIN dataset (only one pass is needed) for PROC FORMAT that formats the existing age groups into the cumulative age groups.
You might even be able to trick one of the STAT procedures into creating your groups.
What do you intend to do later on with the new group? SAS statistical procedures will honor formatted values in CLASS or similar statements, so the format is probably the way to go.
Once we have usable source data and know where you want to go with this, we can come up with code suggestions.
data have;
infile cards expandtabs;
input CUST_AGE Good Bad size_total :percent8.;
format size_total percent8.2;
cards;
. 859 151 4.00%
18 4 3 0.03%
19 14 7 0.08%
20 52 16 0.27%
21 107 13 0.47%
22 152 28 0.71%
23 261 51 1.23%
24 390 54 1.76%
25 379 85 1.84%
26 511 80 2.34%
27 748 108 3.39%
28 791 142 3.69%
29 828 146 3.85%
30 969 180 4.55%
31 1112 158 5.02%
32 1104 177 5.07%
33 1110 169 5.06%
34 1064 189 4.96%
35 1163 215 5.45%
36 1144 164 5.17%
37 970 157 4.46%
38 826 122 3.75%
39 716 146 3.41%
40 694 127 3.25%
41 620 90 2.81%
42 599 64 2.62%
43 542 81 2.46%
44 426 70 1.96%
45 415 75 1.94%
46 343 73 1.65%
47 353 51 1.60%
48 371 42 1.63%
49 351 32 1.52%
50 244 27 1.07%
51 208 34 0.96%
52 200 31 0.91%
53 184 20 0.81%
54 176 10 0.74%
55 141 14 0.61%
56 137 17 0.61%
57 121 13 0.53%
58 78 11 0.35%
59 60 6 0.26%
60 49 20 0.27%
61 24 3 0.11%
62 15 8 0.09%
63 28 3 0.12%
64 13 0 0.05%
65 14 0 0.06%
66 28 1 0.11%
67 12 0 0.05%
68 11 0 0.04%
69 16 5 0.08%
70 5 1 0.02%
71 10 0 0.04%
72 12 3 0.06%
73 1 0 0.00%
74 2 0 0.01%
75 2 0 0.01%
76 . 1 .
77 3 0 0.01%
79 3 0 0.01%
80 1 1 0.01%
;
data temp;
set have;
if missing(CUST_AGE) then delete;
if cum>0.025 then do;group+1;cum=0;end;
cum+size_total;
run;
proc sql;
select *,min(CUST_AGE) as min,max(CUST_AGE) as max
from temp
group by group
order by 1;
quit;
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.