Hi Community,
I have a dataset that one id with multiple consecutive rows. It starts with the same but not unique identifier (character variable). I want to create a new dataset to collapse other valuess with sum base on the identifier. I have sample dataset as below. The first set of identifiers (in green) and the secone set of identifier (in red) will be in different rows. Thank you for the suggestion and helps.
data I have is
id | identifier | a |
1 | 1. | 7 |
1 | 1.a1 | 1 |
1 | 1.a2 | 9 |
1 | 2. | 8 |
1 | 3. | 4 |
1 | 4. | 9 |
1 | 1. | 2 |
1 | 1.a1 | 6 |
1 | 1.a2 | 4 |
1 | 2. | 3 |
1 | 2.a1 | 1 |
1 | 2.a2 | 1 |
1 | 3. | 9 |
1 | 3.a1 | 6 |
1 | 4. | 4 |
Data want is
id | sum_a |
1 | 17 |
1 | 8 |
1 | 4 |
1 | 8 |
1 | 12 |
1 | 5 |
1 | 15 |
1 | 4 |
Hi @CHL0320,
If your real "HAVE" dataset is grouped appropriately (similar to your sample data) and the leading number in the identifier has only one or two digits (so that the first two characters of identifier characterize the contiguous observations to be aggregated), then this should work:
proc summary data=have;
by id identifier notsorted;
format identifier $2.;
var a;
output out=want(drop=_: identifier) sum=sum_a;
run;
(assuming that the sum_a value of the fourth observation in your "WANT" dataset should read 9, not 8).
It seems to me that if you create a unique identifier, then doing the math is simple.
/* UNTESTED CODE */
data have1;
set have;
identifier1=scan(identifier,1,'.');
run;
proc summary data=have1 nway;
class id identifier1;
var a;
output out=want sum=sum_a;
run;
The code is untested as we need data in SAS data sets (not screen captures, not file attachments). From now on, please provide data as working SAS data step code, which you can type in yourself, or you can create by following these instructions.
Hi @CHL0320,
If your real "HAVE" dataset is grouped appropriately (similar to your sample data) and the leading number in the identifier has only one or two digits (so that the first two characters of identifier characterize the contiguous observations to be aggregated), then this should work:
proc summary data=have;
by id identifier notsorted;
format identifier $2.;
var a;
output out=want(drop=_: identifier) sum=sum_a;
run;
(assuming that the sum_a value of the fourth observation in your "WANT" dataset should read 9, not 8).
I would point out that the solution offered by @FreelanceReinh fails if the IDENTIFIER variable has more than two digits before the dot. Perhaps this never happens and isn't worth worrying about, but I thought I would mention it.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.