Solved: group sum by non-unique identifier

CHL0320 · Posted 12-12-2022 03:21 PM

Hi Community,

I have a dataset that one id with multiple consecutive rows. It starts with the same but not unique identifier (character variable). I want to create a new dataset to collapse other valuess with sum base on the identifier. I have sample dataset as below. The first set of identifiers (in green) and the secone set of identifier (in red) will be in different rows. Thank you for the suggestion and helps.

data I have is

id	identifier	a
1	1.	7
1	1.a1	1
1	1.a2	9
1	2.	8
1	3.	4
1	4.	9
1	1.	2
1	1.a1	6
1	1.a2	4
1	2.	3
1	2.a1	1
1	2.a2	1
1	3.	9
1	3.a1	6
1	4.	4

Data want is

id	sum_a
1	17
1	8
1	4
1	8
1	12
1	5
1	15
1	4

FreelanceReinh · Posted 12-12-2022 04:04 PM

Hi @CHL0320,

If your real "HAVE" dataset is grouped appropriately (similar to your sample data) and the leading number in the identifier has only one or two digits (so that the first two characters of identifier characterize the contiguous observations to be aggregated), then this should work:

proc summary data=have;
by id identifier notsorted;
format identifier $2.;
var a;
output out=want(drop=_: identifier) sum=sum_a;
run;

(assuming that the sum_a value of the fourth observation in your "WANT" dataset should read 9, not 8).

View solution in original post

PaigeMiller · Posted 12-12-2022 03:43 PM

It seems to me that if you create a unique identifier, then doing the math is simple.

/* UNTESTED CODE */

data have1;
    set have;
    identifier1=scan(identifier,1,'.');
run;
proc summary data=have1 nway;
    class id identifier1;
    var a;
    output out=want sum=sum_a;
run;

The code is untested as we need data in SAS data sets (not screen captures, not file attachments). From now on, please provide data as working SAS data step code, which you can type in yourself, or you can create by following these instructions.

--
Paige Miller

FreelanceReinh · Posted 12-12-2022 04:04 PM

Hi @CHL0320,

If your real "HAVE" dataset is grouped appropriately (similar to your sample data) and the leading number in the identifier has only one or two digits (so that the first two characters of identifier characterize the contiguous observations to be aggregated), then this should work:

proc summary data=have;
by id identifier notsorted;
format identifier $2.;
var a;
output out=want(drop=_: identifier) sum=sum_a;
run;

(assuming that the sum_a value of the fourth observation in your "WANT" dataset should read 9, not 8).

PaigeMiller · Posted 12-12-2022 05:13 PM

I would point out that the solution offered by @FreelanceReinh fails if the IDENTIFIER variable has more than two digits before the dot. Perhaps this never happens and isn't worth worrying about, but I thought I would mention it.

--
Paige Miller

CHL0320 · Posted 12-12-2022 05:49 PM

Thank you so much for the points. It is my pleasure to learn from the masters like you.

group sum by non-unique identifier

Re: group sum by non-unique identifier

Re: group sum by non-unique identifier

Re: group sum by non-unique identifier

Re: group sum by non-unique identifier

Re: group sum by non-unique identifier

Catch up on SAS Innovate 2026

group sum by non-unique identifier

Re: group sum by non-unique identifier

Re: group sum by non-unique identifier

Re: group sum by non-unique identifier

Re: group sum by non-unique identifier

Re: group sum by non-unique identifier

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away