SAS aggregation when too many variables with different priority rules are included

Reply
Learner
Posts: 1

SAS aggregation when too many variables with different priority rules are included

Hello there, I recently get stuck on how to do data aggregation, any input would be appreciated.

 

The table I would like to do aggregation has multiple lines of info for users with different IDs.

I would like to summarize the data into only one line of all desired info for each member. However, different rules in choosing the right value for different variables makes it hard for me.

 

Example: Original table and desired outcome for the table is listed below (only one member is shown here):

ID

Name

DOB

Event1

Time1-A

Time1-B

Event2

Time2-A

Time2-B

Event3

Time3-A

Time3-B

Time3-C

123

JaneD

ABC

CCC

1/1/2017

1/3/2017

.

.

.

.

.

.

.

345

JaneD

ABC

NNN

3/3/2017

3/7/2017

.

.

.

QQQ

4/3/2017

3/5/2017

3/9/2017

456

JaneD

ABC

.

.

.

NNN

11/26/2016

2/9/2017

NNN

5/17/2017

4/7/2017

5/20/2017

 

to:

Name

DOB

Event1

Time1

Event2

Time2

Event3

Time3

JaneD

ABC

CCC

1/3/2017

NNN

2/9/2017

QQQ

4/3/2017

 

Rules are:

  • Event123 has to be selected based on the priority of CCC>QQQ>NNN.
  • TimeABC etc. have to match Event and always choose the max value between A&B.
  • Event Category overrule the max value of time.

I tried proc summary but it won't always have the right match between Event and max time.

Can anyone help on what aggregation method might be useful in summarizing info when too many variables are involved?

Thanks!

Trusted Advisor
Posts: 1,479

Re: SAS aggregation when too many variables with different priority rules are included

1) Break the row into separated events either by proc transpose or by a data step:

    

data temp;
  set have; 
        keep ID NAME DOB EventX PRIORITY TimeX;
        array ev Evwnt1-Event3;
        array tm Time_1A Time_1B Time_2A Time_2B TIme_3A  TIme_3B;
        do i=1 to 3;
             EventX = ev(i);
/* add PRIORITY variable */
if not missing(EventX) then do; TimeX = tm(i); output;
end; end; run;

Proc sort data=temp; by Name Priority TimeX; run;

   How does Time_3C of Enent3 play with the rules ? 

 

2) Add a new variable, the PRIORITY code, assigned according to event code.
    You can add it in same above step, either using a format or a CASE statement.

 

3) Sort the TEMP dataset by Name PRIORITY and TimeX.

 

4) Select desired event per Name ant transpose into desired row format, using tool

    depending on output type: Dataset or Report.

    

Super User
Posts: 19,167

Re: SAS aggregation when too many variables with different priority rules are included

1. Post more test cases that cover all scenarios. Otheriwse we're going to code something and you're going to come back and say it doesn't work. 

2. Convert CCC/QQQ/NNN to numbers. CCC = 3 QQQ = 2 NNN = 1. Then you can take the max as need and convert at the end. 

3. Look at ARRAYS -> 3, one for each series, Event1-3, TIMEA1-3, TIMEB1-3 so you can loop through as necessary. 

 

Can you more clearly define define how the rules are implemented for dates, I didn't quite get that. 

 

As mentioned on your original post, a data step is likely warranted. 

 

Another option is to transpose your data and deal with it in a long form and then transposing back. 

Super User
Posts: 9,875

Re: SAS aggregation when too many variables with different priority rules are included

The following could get you a start.

 


data want;
 set have;
 array x{*} event:;
 array y{*} time1:;
 array z{*} time2:;
 c=whichc('CCC',of x{*});
 q=whichc('QQQ',of x{*});
 n=whichc('NNN',of x{*});

if c then do;
 want_event=x{c};
 want_time=max(y{c},z{c});
end;
else if q then do;
 want_event=x{q};
 want_time=max(y{q},z{q});
end;
else if n then do;
 want_event=x{n};
 want_time=max(y{n},z{n});
end;
run;
Ask a Question
Discussion stats
  • 3 replies
  • 155 views
  • 0 likes
  • 4 in conversation