Solved: Apply logic based on waterfall for unique ids

sasuser101 · Posted 06-01-2017 08:43 PM

Hi

i have a sample dataset where i need to apply logic to unique_id as per code it has in waterfall manner.

For instance if for unique ID 1, Code A is found which gets Yes in preference then other instance of unique ID 1 will get "no" in Preference column

for unique ID 2 Code B is found which gets Yes in preference then other instance of unique ID 2 will get "no" in Preference column

for unique ID 3 Code E is found which gets Yes in preference then other instance of unique ID 3 will get "no" in Preference column

so waterfall for Code follows the order as A,B,E,D,C.

So A gets first prefernce if found for unique id,

B gets second and so on

How can i apply this waterfall for unique id. So in nutshell if i find a code for unique id as per Waterfall it should get Yes in preference and rest all other instance of same unique id get No in preference.

UNIQUE_ID CODE Preference

1 A Yes

1 B no

2 B Yes

2 C no

3 C no

3 D no

3 E Yes

mkeintz · Posted 06-02-2017 12:21 AM

Read all the 'Yes' values, determining the prime yes (i.e. A=yes supersedes B=yes >> E=yes >> C=yes >> D=yes). Then read all the cases turning everything to No except the prime yes.

In other words if an ID starts out with 3 Yes's and 2 No's, the 3 Yes's are read first [see the "where=(preference='Yes')" parameter] to determine the prime yes. Then all 5 Yes's and No's are read, assigning No to all but the prime yes:

data have;
  input UNIQUE_ID        CODE :$1.          Preference :$3.;
datalines;
1 A Yes
1 B no
2 B Yes
2 C no
3 C no
3 D no
3 E Yes
run;

data want (drop=prime_rank);
  set have (in=inyes where=(preference='Yes'))
      have (in=inkeep);
  by unique_id;
  if first.unique_id then prime_rank=.;
  retain prime_rank;

  if inyes then do;
    prime_rank=min(prime_rank,findc('ABEDC',code));
    delete;  /* Delete this case, it will be read again, and output below */
  end;
  if NOT(code=CHAR('ABEDC',prime_rank) and preference='Yes') then preference='No';
run;

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

Reeza · Posted 06-01-2017 10:42 PM

Create a custom format that maps the order.

ABEDC

Then A->1

B->2

E->3

D->4

C->5

Sort by the new variable and use BY groups to identify the first.

http://support.sas.com/documentation/cdl/en/lrcon/69852/HTML/default/viewer.htm#p0e9b2d12lpyjkn1b0y1...

mkeintz · Posted 06-02-2017 12:21 AM

Read all the 'Yes' values, determining the prime yes (i.e. A=yes supersedes B=yes >> E=yes >> C=yes >> D=yes). Then read all the cases turning everything to No except the prime yes.

In other words if an ID starts out with 3 Yes's and 2 No's, the 3 Yes's are read first [see the "where=(preference='Yes')" parameter] to determine the prime yes. Then all 5 Yes's and No's are read, assigning No to all but the prime yes:

data have;
  input UNIQUE_ID        CODE :$1.          Preference :$3.;
datalines;
1 A Yes
1 B no
2 B Yes
2 C no
3 C no
3 D no
3 E Yes
run;

data want (drop=prime_rank);
  set have (in=inyes where=(preference='Yes'))
      have (in=inkeep);
  by unique_id;
  if first.unique_id then prime_rank=.;
  retain prime_rank;

  if inyes then do;
    prime_rank=min(prime_rank,findc('ABEDC',code));
    delete;  /* Delete this case, it will be read again, and output below */
  end;
  if NOT(code=CHAR('ABEDC',prime_rank) and preference='Yes') then preference='No';
run;

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

Apply logic based on waterfall for unique ids

Re: Apply logic based on waterfall for unique ids

Re: Apply logic based on waterfall for unique ids

Re: Apply logic based on waterfall for unique ids

Apply logic based on waterfall for unique ids

Re: Apply logic based on waterfall for unique ids

Re: Apply logic based on waterfall for unique ids

Re: Apply logic based on waterfall for unique ids

Click image to register for webinar

Classroom Training Available!