Hi All,
My data set has following fields
Customer_ID(Primary key) MCG_1 MCG_2 MCG_3 MCG_4 MCG_5
The 5 attributes MCG_1 to MCG_5 are row wise ranks based on customer spends in 5 merchant category groups(Shopping/Restaurants etc) also
containing ties and blanks.
Sample dataset:
Cust_ID,MCG_1,MCG_2,MCG_3,MCG_4,MCG_5,
3589,2,5,1,3,4,
1366,1.5,1.5,4,3,5,
6144,,1,2,4,3
I need to create 5 variables in the dataset which will give me variable names based on their ranks;and both the variable names in case of ties(eg 2nd record below)
How can I do it in Base SAS ?
Intended outcome:
Cust_ID | Rank1 | Rank2 | Rank3 | Rank4 | Rank5 |
3589 | MCG_3 | MCG_1 | MCG_4 | MCG_5 | MCG_2 |
1366 | MCG_1/MCG_2 | MCG_4 | MCG_3 | MCG_5 | |
6144 | MCG_2 | MCG_3 | MCG_5 | MCG_4 |
Thanks,
Shalini
I think that the following will do what you want:
data want (drop=MCG_1-MCG_5);
set have;
array _r(*) MCG_1-MCG_5;
array rank(5) $11.;
call missing (of rank(*));
do _n_=1 to dim(_r);
if not missing(_r(_n_)) then do;
if int(_r(_n_)) eq _r(_n_) then rank(_r(_n_))=vname(_r(_n_));
else if missing(rank(int(_r(_n_)))) then rank(int(_r(_n_)))=
vname(_r(int(_r(_n_))));
else rank(int(_r(_n_)))=catx('/',rank(int(_r(_n_))),
vname(_r(_n_)));
end;
end;
run;
Thanks a lot Arthur for your help..
When I increased the attributes from 5 to 11, the code provides incorrect output in case of ties.
For eg:
Cust_ID,MCG_1,MCG_2,MCG_3,MCG_4,MCG_5,MCG_6,MCG_7,MCG_8,MCG_9,MCG_10,MCG_11,
3478,2.5,2.5,2.5,9.5,9.5,8,,2.5,5.5,7,5.5
the code gives below output:
Rank1-(blank), Rank2- MCG_2/MCG_2/MCG_3/MCG_8
Rank3-(blank), Rank4- (blank),
Rank5 - MCG_5/MCG_11, Rank6 - (blank),
Rank7 - MCG_10, Rank8 - MCG_6,
Rank9 -MCG_9/MCG_5 , Rank10 - (blank), Rank11 - (blank)
The required outcome is:
Rank1 - (MCG_1/MCG_2/MCG_3/MCG_8 , Rank2- (blank),
Rank3-(blank), Rank4- (blank),
Rank5 - MCG_9/MCG_11, Rank6 - (blank),
Rank7 - MCG_10, Rank8 - MCG_6,
Rank9 -MCG_4/MCG_5 , Rank10 - (blank), Rank11 - (blank)
What can be done to achieve the required outcome?
Restructure data using PROC TRANSPOSE and then use PROC RANK TIES=LOW to convert your ranks into integers that you can use as array indexes.
data have ;
infile cards dsd truncover ;
input Cust_ID MCG_1 - MCG_11 ;
cards;
3589,2,5,1,3,4,
1366,1.5,1.5,4,3,5,
6144,,1,2,4,3
3478,2.5,2.5,2.5,9.5,9.5,8,.,2.5,5.5,7,5.5
run;
proc transpose data=have out=vertical ;
by cust_id notsorted;
var mcg_: ;
run;
proc rank data=vertical ties=low out=vertical2;
by cust_id notsorted;
var col1;
ranks rank;
run;
data want ;
set vertical2 ;
by cust_id notsorted;
array r $100 rank1-rank11 ;
retain rank1-rank11;
keep cust_id rank1-rank11;
if first.cust_id then call missing(of rank1-rank11);
if rank then r(rank)=catx('/',r(rank),_name_);
if last.cust_id ;
run;
data _null_;
set want ;
put (_all_) (=/);
run;
Cust_ID=3589
rank1=MCG_3
rank2=MCG_1
rank3=MCG_4
rank4=MCG_5
rank5=MCG_2
rank6=
rank7=
rank8=
rank9=
rank10=
rank11=
Cust_ID=1366
rank1=MCG_1/MCG_2
rank2=
rank3=MCG_4
rank4=MCG_3
rank5=MCG_5
rank6=
rank7=
rank8=
rank9=
rank10=
rank11=
Cust_ID=6144
rank1=MCG_2
rank2=MCG_3
rank3=MCG_5
rank4=MCG_4
rank5=
rank6=
rank7=
rank8=
rank9=
rank10=
rank11=
Cust_ID=3478
rank1=MCG_1/MCG_2/MCG_3/MCG_8
rank2=
rank3=
rank4=
rank5=MCG_9/MCG_11
rank6=
rank7=MCG_10
rank8=MCG_6
rank9=MCG_4/MCG_5
rank10=
rank11=
Thanks a lot..Both the codes worked for me
HI..
For the above query,I have a slight modification in the requirement in case of ties.
Sample dataset:
Cust_ID,MCG_1,MCG_2,MCG_3,MCG_4,MCG_5,MCG_6,MCG_7,MCG_8,MCG_9,MCG_10,MCG_11,
Sp_1, Sp_2, Sp_3, Sp_4, Sp_5, Sp_6, Sp_7, Sp_8, Sp_9, Sp_10, Sp_11,
3478,2.5,2.5,2.5,9.5,9.5,8,.,2.5,5.5,7,5.5,1000,2000,3000,4000,5000,6000,7000,8000,9000,10000,11000,
3479,1.5,3.5,3.5,8.5,8.5,10,11,1.5,6,6,6,15000,14000,12500,11000,9000,8000,7500,15000,6000,5000,4000
The 11 attributes MCG_1 to MCG_11 are row wise ranks based on customer RFM scores in 11 merchant category groups(Shopping/Restaurants etc) and sp_1 to sp_11 their respective spends($)
The codes shared by Tom & Arthur helped me create 11 new variables which gave me variable names based on their ranks;and both the variable names in case of ties.
In case of ties,instead of tied variable names separted by '/ ' I now want the variable name of the Max value of corresponding tied spends.
Earlier outcome was:
Cust_ID | rank1 | rank2 | rank3 | rank4 | rank5 | rank6 | rank7 | rank8 | rank9 | rank10 | rank11 |
3478 | MCG_1/MCG_2/MCG_3/MCG_8 | MCG_9/MCG_11 | MCG_10 | MCG_6 | MCG_4/MCG_5 | ||||||
3479 | MCG_1/MCG_8 | MCG_2/MCG_3 | MCG_9/MCG_10/MCG_11 | MCG_4/MCG_5 | MCG_6 | MCG_7 |
Required outcome:
Cust_ID | rank1 | rank2 | rank3 | rank4 | rank5 | rank6 | rank7 | rank8 | rank9 | rank10 | rank11 |
3478 | sp_8 | sp_3 | sp_2 | sp_1 | sp_11 | sp_9 | MCG_10 | MCG_6 | sp_5 | sp_4 | |
3479 | sp_1 | sp_8 | sp_2 | sp_3 | sp_9 | sp_10 | sp_11 | sp_4 | sp_5 | MCG_6 | MCG_7 |
In 2nd record,MCG(1 & 😎 as well as Sp variable is tied for rank 1.Either of the variable name sp_1 or sp_8 will be acceptable in rank1 & rank2 values.
Can this be done in SAS?
The 'earlier outcome' & 'required outcome' tables have 20+ columns & may not appear totally on the screen.Hence,I am attaching a csv file
'Reqd_Outcome.csv' for your reference.
You would have to test the following, but I think it does what you want:
data have;
infile cards dlm=',';
input Cust_ID MCG_1-MCG_11;
cards;
3478,2.5,2.5,2.5,9.5,9.5,8,.,2.5,5.5,7,5.5
3479,1.5,3.5,3.5,8.5,8.5,10,11,1.5,6,6,6
;
data want (drop=MCG: counter: _: i j);
set have;
array r(*) MCG_1-MCG_11;
array rank(11) $70.;
array counter(11);
array _rank(11);
call missing (of rank(*));
call missing (of counter(*));
call missing (of _rank(*));
do i=1 to 11;
if not missing(r(i)) then do;
do j=1 to 11;
if r(i)=r(j) then counter(i)+1;
end;
if counter(i) eq 1 then _rank(i)=r(i);
else _rank(i)=round(r(i))-(int(counter(i)/2));
end;
end;
do i=1 to dim(r);
if not missing(r(i)) then do;
if r(i) eq _rank(i) then rank(r(i))=vname(r(i));
else rank(_rank(i))=catx('/',rank(_rank(i)),vname(r(i)));
end;
end;
run;
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.