Hi
i have a sample dataset where i need to apply logic to unique_id as per code it has in waterfall manner.
For instance if for unique ID 1, Code A is found which gets Yes in preference then other instance of unique ID 1 will get "no" in Preference column
for unique ID 2 Code B is found which gets Yes in preference then other instance of unique ID 2 will get "no" in Preference column
for unique ID 3 Code E is found which gets Yes in preference then other instance of unique ID 3 will get "no" in Preference column
so waterfall for Code follows the order as A,B,E,D,C.
So A gets first prefernce if found for unique id,
B gets second and so on
How can i apply this waterfall for unique id. So in nutshell if i find a code for unique id as per Waterfall it should get Yes in preference and rest all other instance of same unique id get No in preference.
UNIQUE_ID CODE Preference
1 A Yes
1 B no
2 B Yes
2 C no
3 C no
3 D no
3 E Yes
Read all the 'Yes' values, determining the prime yes (i.e. A=yes supersedes B=yes >> E=yes >> C=yes >> D=yes). Then read all the cases turning everything to No except the prime yes.
In other words if an ID starts out with 3 Yes's and 2 No's, the 3 Yes's are read first [see the "where=(preference='Yes')" parameter] to determine the prime yes. Then all 5 Yes's and No's are read, assigning No to all but the prime yes:
data have;
input UNIQUE_ID CODE :$1. Preference :$3.;
datalines;
1 A Yes
1 B no
2 B Yes
2 C no
3 C no
3 D no
3 E Yes
run;
data want (drop=prime_rank);
set have (in=inyes where=(preference='Yes'))
have (in=inkeep);
by unique_id;
if first.unique_id then prime_rank=.;
retain prime_rank;
if inyes then do;
prime_rank=min(prime_rank,findc('ABEDC',code));
delete; /* Delete this case, it will be read again, and output below */
end;
if NOT(code=CHAR('ABEDC',prime_rank) and preference='Yes') then preference='No';
run;
Create a custom format that maps the order.
ABEDC
Then A->1
B->2
E->3
D->4
C->5
Sort by the new variable and use BY groups to identify the first.
Read all the 'Yes' values, determining the prime yes (i.e. A=yes supersedes B=yes >> E=yes >> C=yes >> D=yes). Then read all the cases turning everything to No except the prime yes.
In other words if an ID starts out with 3 Yes's and 2 No's, the 3 Yes's are read first [see the "where=(preference='Yes')" parameter] to determine the prime yes. Then all 5 Yes's and No's are read, assigning No to all but the prime yes:
data have;
input UNIQUE_ID CODE :$1. Preference :$3.;
datalines;
1 A Yes
1 B no
2 B Yes
2 C no
3 C no
3 D no
3 E Yes
run;
data want (drop=prime_rank);
set have (in=inyes where=(preference='Yes'))
have (in=inkeep);
by unique_id;
if first.unique_id then prime_rank=.;
retain prime_rank;
if inyes then do;
prime_rank=min(prime_rank,findc('ABEDC',code));
delete; /* Delete this case, it will be read again, and output below */
end;
if NOT(code=CHAR('ABEDC',prime_rank) and preference='Yes') then preference='No';
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.