BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
CHL0320
Obsidian | Level 7

Hi Community,

I have a dataset that one id with multiple consecutive rows. It starts with the same but not unique identifier (character variable). I want to create a new dataset to collapse other valuess with sum base on the identifier. I have sample dataset as below. The first set of identifiers (in green) and the secone set of identifier (in red) will be in different rows. Thank you for the suggestion and helps. 

 

data I have is 

id identifier a
1 1. 7
1 1.a1 1
1 1.a2 9
1 2. 8
1 3. 4
1 4. 9
1 1. 2
1 1.a1 6
1 1.a2 4
1 2. 3
1 2.a1 1
1 2.a2 1
1 3. 9
1 3.a1 6
1 4. 4

Data want is 

id sum_a
1 17
1 8
1 4
1 8
1 12
1 5
1 15
1 4
1 ACCEPTED SOLUTION

Accepted Solutions
FreelanceReinh
Jade | Level 19

Hi @CHL0320,

 

If your real "HAVE" dataset is grouped appropriately (similar to your sample data) and the leading number in the identifier has only one or two digits (so that the first two characters of identifier characterize the contiguous observations to be aggregated), then this should work:

proc summary data=have;
by id identifier notsorted;
format identifier $2.;
var a;
output out=want(drop=_: identifier) sum=sum_a;
run;

(assuming that the sum_a value of the fourth observation in your "WANT" dataset should read 9, not 8).

View solution in original post

4 REPLIES 4
PaigeMiller
Diamond | Level 26

It seems to me that if you create a unique identifier, then doing the math is simple.

 

/* UNTESTED CODE */

data have1;
    set have;
    identifier1=scan(identifier,1,'.');
run;
proc summary data=have1 nway;
    class id identifier1;
    var a;
    output out=want sum=sum_a;
run;

 

The code is untested as we need data in SAS data sets (not screen captures, not file attachments). From now on, please provide data as working SAS data step code, which you can type in yourself, or you can create by following these instructions.

--
Paige Miller
FreelanceReinh
Jade | Level 19

Hi @CHL0320,

 

If your real "HAVE" dataset is grouped appropriately (similar to your sample data) and the leading number in the identifier has only one or two digits (so that the first two characters of identifier characterize the contiguous observations to be aggregated), then this should work:

proc summary data=have;
by id identifier notsorted;
format identifier $2.;
var a;
output out=want(drop=_: identifier) sum=sum_a;
run;

(assuming that the sum_a value of the fourth observation in your "WANT" dataset should read 9, not 8).

PaigeMiller
Diamond | Level 26

I would point out that the solution offered by @FreelanceReinh fails if the IDENTIFIER variable has more than two digits before the dot. Perhaps this never happens and isn't worth worrying about, but I thought I would mention it.

--
Paige Miller
CHL0320
Obsidian | Level 7
Thank you so much for the points. It is my pleasure to learn from the masters like you.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 511 views
  • 1 like
  • 3 in conversation