BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
ResegoM
Calcite | Level 5

Hi there,

 

I was wondering if someone could help me with some coding.
I have a dataset that I need help grouping. The highlighted column (BR_grp) is the required column. This is created by looking at the cumulative sum for "size_total", where a new group is created once the cumulative sum has reached 2.5%.

I have also named the groups in "BR_grp" according to the corresponding customer age (it is preferable to have it like this however I am open to other solutions). I have tried creating a counter with multiple arrays etc and nothing seems to work.

I did create the variable "cum_size1" which keeps "size_total" variable if the cumulative size is less than 2.5% and 0 for anything greater... maybe that might help?
I've attached an excel sheet below.
Thank you 🙂

 

ResegoM_0-1649940122952.png

 

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User
data have;
infile cards expandtabs;
input CUST_AGE	Good	Bad	size_total :percent8.;
format size_total percent8.2;
cards;
.	859	151	4.00%
18	4	3	0.03%
19	14	7	0.08%
20	52	16	0.27%
21	107	13	0.47%
22	152	28	0.71%
23	261	51	1.23%
24	390	54	1.76%
25	379	85	1.84%
26	511	80	2.34%
27	748	108	3.39%
28	791	142	3.69%
29	828	146	3.85%
30	969	180	4.55%
31	1112	158	5.02%
32	1104	177	5.07%
33	1110	169	5.06%
34	1064	189	4.96%
35	1163	215	5.45%
36	1144	164	5.17%
37	970	157	4.46%
38	826	122	3.75%
39	716	146	3.41%
40	694	127	3.25%
41	620	90	2.81%
42	599	64	2.62%
43	542	81	2.46%
44	426	70	1.96%
45	415	75	1.94%
46	343	73	1.65%
47	353	51	1.60%
48	371	42	1.63%
49	351	32	1.52%
50	244	27	1.07%
51	208	34	0.96%
52	200	31	0.91%
53	184	20	0.81%
54	176	10	0.74%
55	141	14	0.61%
56	137	17	0.61%
57	121	13	0.53%
58	78	11	0.35%
59	60	6	0.26%
60	49	20	0.27%
61	24	3	0.11%
62	15	8	0.09%
63	28	3	0.12%
64	13	0	0.05%
65	14	0	0.06%
66	28	1	0.11%
67	12	0	0.05%
68	11	0	0.04%
69	16	5	0.08%
70	5	1	0.02%
71	10	0	0.04%
72	12	3	0.06%
73	1	0	0.00%
74	2	0	0.01%
75	2	0	0.01%
76	.	1	.
77	3	0	0.01%
79	3	0	0.01%
80	1	1	0.01%
;

data temp;
 set have;
if missing(CUST_AGE) then delete;
if cum>0.025 then do;group+1;cum=0;end;
cum+size_total;
run;

proc sql;
select *,min(CUST_AGE) as min,max(CUST_AGE) as max
 from temp
  group by group
order by 1;
quit;

View solution in original post

8 REPLIES 8
PaigeMiller
Diamond | Level 26

Many of us will not (or cannot) download Excel files, as they are a security threat. The proper way to provide your example data is via these instructions (or by creating the equivalent SAS data step code by typing it in yourself). We also can't use screen captures of data to write programs.

 

First, I would just create consecutive group numbers rather than trying to create 18_23 in one step. It seems you should be able to do that from your cum_size1 variable. Then ... use PROC SUMMARY to find the minimum and max age in each group, and from there you can create group names like 18_23.

 

Please explain why 43 is in a group by itself and not 43_44.

--
Paige Miller
PaigeMiller
Diamond | Level 26

I'm not following your logic

 

43 is 2.46% so it is not >2.5%

--
Paige Miller
ResegoM
Calcite | Level 5
Hi ,
I am a little new here so I will definitely type out code for my data step the next time I may have a query.
For now I did see a solution below that worked.
Thank you for taking the time to help 🙂
Kurt_Bremser
Super User

Since here is the SAS Communities and not the Excel support forum, you should supply data in SAS style, which means a data step with datalines, posted into a code box. All this is described in your welcome email (if the communities moderators told us the truth).

 

Since your group name is dependent on the first and last member of a group, you will probably need to a double DO loop, in the first you determine the start and end of a group, in the second you reread all these values and set the new variable, and OUTPUT.

 

Or you read through your dataset and create a CNTLIN dataset (only one pass is needed) for PROC FORMAT that formats the existing age groups into the cumulative age groups.

 

You might even be able to trick one of the STAT procedures into creating your groups.

 

What do you intend to do later on with the new group? SAS statistical procedures will honor formatted values in CLASS or similar statements, so the format is probably the way to go.

 

Once we have usable source data and know where you want to go with this, we can come up with code suggestions.

Ksharp
Super User
data have;
infile cards expandtabs;
input CUST_AGE	Good	Bad	size_total :percent8.;
format size_total percent8.2;
cards;
.	859	151	4.00%
18	4	3	0.03%
19	14	7	0.08%
20	52	16	0.27%
21	107	13	0.47%
22	152	28	0.71%
23	261	51	1.23%
24	390	54	1.76%
25	379	85	1.84%
26	511	80	2.34%
27	748	108	3.39%
28	791	142	3.69%
29	828	146	3.85%
30	969	180	4.55%
31	1112	158	5.02%
32	1104	177	5.07%
33	1110	169	5.06%
34	1064	189	4.96%
35	1163	215	5.45%
36	1144	164	5.17%
37	970	157	4.46%
38	826	122	3.75%
39	716	146	3.41%
40	694	127	3.25%
41	620	90	2.81%
42	599	64	2.62%
43	542	81	2.46%
44	426	70	1.96%
45	415	75	1.94%
46	343	73	1.65%
47	353	51	1.60%
48	371	42	1.63%
49	351	32	1.52%
50	244	27	1.07%
51	208	34	0.96%
52	200	31	0.91%
53	184	20	0.81%
54	176	10	0.74%
55	141	14	0.61%
56	137	17	0.61%
57	121	13	0.53%
58	78	11	0.35%
59	60	6	0.26%
60	49	20	0.27%
61	24	3	0.11%
62	15	8	0.09%
63	28	3	0.12%
64	13	0	0.05%
65	14	0	0.06%
66	28	1	0.11%
67	12	0	0.05%
68	11	0	0.04%
69	16	5	0.08%
70	5	1	0.02%
71	10	0	0.04%
72	12	3	0.06%
73	1	0	0.00%
74	2	0	0.01%
75	2	0	0.01%
76	.	1	.
77	3	0	0.01%
79	3	0	0.01%
80	1	1	0.01%
;

data temp;
 set have;
if missing(CUST_AGE) then delete;
if cum>0.025 then do;group+1;cum=0;end;
cum+size_total;
run;

proc sql;
select *,min(CUST_AGE) as min,max(CUST_AGE) as max
 from temp
  group by group
order by 1;
quit;
ResegoM
Calcite | Level 5
Thank you,
I typed this out and it worked perfectly 🙂

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 8 replies
  • 2332 views
  • 1 like
  • 4 in conversation