DATA Step, Macro, Functions and more

data pick

Reply
Frequent Contributor
Posts: 87

data pick

Hi,

I've a source dataset like table a.  I need my resultant data to look like table b. The logic is, pick the record with higher priority # ,  but if the priority #'s are same I need to pick record where dst = 'N'. I wrote below code but it works only in the first acct and not for second account.

proc sort data = a;

by acct dst descending priority; run;

data b;

set a;

by acct dst descending priority;

if first.acct;

run;

Source - a

acct

zip

timezone

gmtoffset

dst

dst_start

dst_end

priority

areacode

9999

Mountain

-7

N

3/9/2014

11/2/2014

4

928

9999

Mountain

-7

N

3/9/2014

11/2/2014

4

928

9999

86033

Mountain

-7

Y

3/9/2014

11/2/2014

4

.

1111

00801

Atlantic

-4

N

3/9/2014

11/2/2014

1

.

1111

Mountain

-7

Y

3/9/2014

11/2/2014

4

720

Result - b

acct

zip

timezone

gmtoffset

dst

dst_start

dst_end

priority

areacode

9999

86033

Mountain

-7

Y

3/9/2014

11/2/2014

4

928

1111

Mountain

-7

Y

3/9/2014

11/2/2014

4

720

Super User
Posts: 17,784

Re: data pick

proc sort data = a;

by acct  priority dst; run;

data b;

set a;

by acct priority dst;

if first.acct;

run;

Frequent Contributor
Posts: 87

Re: data pick

Reeza, Thanks!

wouldn't that select below record? I want the record with Mountain timezone

acct

zip

timezone

gmtoffset

dst

dst_start

dst_end

priority

areacode

1111

00801

Atlantic

-4

N

3/9/2014

11/2/2014

1

.

Super User
Posts: 17,784

Re: data pick

Forgot the descending option:

proc sort data = a;

by acct  descending priority dst; run;

data b;

set a;

by acct descending priority dst;

if first.acct;

run;

Respected Advisor
Posts: 4,644

Re: data pick

Your example shows that you want the records with dst="Y", not "N".

One simple way to achieve this is to give a slight priority boost to records with dst="Y". I assume that there is at most one record with dst="Y" per acct and that priority # are integers:

proc sql;

create table b as

select * from a

group by acct

having priority+(dst="Y")/2 = max(priority+(dst="Y")/2);

select * from b;

quit;

PG

PG
Frequent Contributor
Posts: 87

Re: data pick

It is not safe to assume that there will be at most one record with dst="Y". I might have all records in an account that have dst='N'

Respected Advisor
Posts: 4,644

Re: data pick

Your problem is not stated very clearly, so I made some assumptions. For every acct you want:

1) the record with the highest priority

2) in case of a tie in priority, the record with dst = "N"

3) in case of a tie in priority and dst, the record with timezone="Mountain"

4) in case of a tie in priority, dst and timezone, any one of the tied records

proc sql;

create table b as

select unique * from a

group by acct

having priority+(dst="N")/2+(timezone="Mountain")/4 =

  max(priority+(dst="N")/2+(timezone="Mountain")/4);

select * from b;

quit;

PG

PG
Ask a Question
Discussion stats
  • 6 replies
  • 233 views
  • 0 likes
  • 3 in conversation