BookmarkSubscribeRSS Feed
Hesper
Calcite | Level 5

Hello there, I recently get stuck on how to do data aggregation, any input would be appreciated.

 

The table I would like to do aggregation has multiple lines of info for users with different IDs.

I would like to summarize the data into only one line of all desired info for each member. However, different rules in choosing the right value for different variables makes it hard for me.

 

Example: Original table and desired outcome for the table is listed below (only one member is shown here):

ID

Name

DOB

Event1

Time1-A

Time1-B

Event2

Time2-A

Time2-B

Event3

Time3-A

Time3-B

Time3-C

123

JaneD

ABC

CCC

1/1/2017

1/3/2017

.

.

.

.

.

.

.

345

JaneD

ABC

NNN

3/3/2017

3/7/2017

.

.

.

QQQ

4/3/2017

3/5/2017

3/9/2017

456

JaneD

ABC

.

.

.

NNN

11/26/2016

2/9/2017

NNN

5/17/2017

4/7/2017

5/20/2017

 

to:

Name

DOB

Event1

Time1

Event2

Time2

Event3

Time3

JaneD

ABC

CCC

1/3/2017

NNN

2/9/2017

QQQ

4/3/2017

 

Rules are:

  • Event123 has to be selected based on the priority of CCC>QQQ>NNN.
  • TimeABC etc. have to match Event and always choose the max value between A&B.
  • Event Category overrule the max value of time.

I tried proc summary but it won't always have the right match between Event and max time.

Can anyone help on what aggregation method might be useful in summarizing info when too many variables are involved?

Thanks!

3 REPLIES 3
Shmuel
Garnet | Level 18

1) Break the row into separated events either by proc transpose or by a data step:

    

data temp;
  set have; 
        keep ID NAME DOB EventX PRIORITY TimeX;
        array ev Evwnt1-Event3;
        array tm Time_1A Time_1B Time_2A Time_2B TIme_3A  TIme_3B;
        do i=1 to 3;
             EventX = ev(i);
/* add PRIORITY variable */
if not missing(EventX) then do; TimeX = tm(i); output;
end; end; run;

Proc sort data=temp; by Name Priority TimeX; run;

   How does Time_3C of Enent3 play with the rules ? 

 

2) Add a new variable, the PRIORITY code, assigned according to event code.
    You can add it in same above step, either using a format or a CASE statement.

 

3) Sort the TEMP dataset by Name PRIORITY and TimeX.

 

4) Select desired event per Name ant transpose into desired row format, using tool

    depending on output type: Dataset or Report.

    

Reeza
Super User

1. Post more test cases that cover all scenarios. Otheriwse we're going to code something and you're going to come back and say it doesn't work. 

2. Convert CCC/QQQ/NNN to numbers. CCC = 3 QQQ = 2 NNN = 1. Then you can take the max as need and convert at the end. 

3. Look at ARRAYS -> 3, one for each series, Event1-3, TIMEA1-3, TIMEB1-3 so you can loop through as necessary. 

 

Can you more clearly define define how the rules are implemented for dates, I didn't quite get that. 

 

As mentioned on your original post, a data step is likely warranted. 

 

Another option is to transpose your data and deal with it in a long form and then transposing back. 

Ksharp
Super User

The following could get you a start.

 


data want;
 set have;
 array x{*} event:;
 array y{*} time1:;
 array z{*} time2:;
 c=whichc('CCC',of x{*});
 q=whichc('QQQ',of x{*});
 n=whichc('NNN',of x{*});

if c then do;
 want_event=x{c};
 want_time=max(y{c},z{c});
end;
else if q then do;
 want_event=x{q};
 want_time=max(y{q},z{q});
end;
else if n then do;
 want_event=x{n};
 want_time=max(y{n},z{n});
end;
run;

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 922 views
  • 0 likes
  • 4 in conversation