Hi All,
My data set has following fields
Customer_ID(Primary key) MCG_1 MCG_2 MCG_3 MCG_4 MCG_5
The 5 attributes MCG_1 to MCG_5 are row wise ranks based on customer spends in 5 merchant category groups(Shopping/Restaurants etc) also
containing ties and blanks.
Sample dataset:
Cust_ID,MCG_1,MCG_2,MCG_3,MCG_4,MCG_5,
3589,2,5,1,3,4,
1366,1.5,1.5,4,3,5,
6144,,1,2,4,3
I need to create 5 variables in the dataset which will give me variable names based on their ranks;and both the variable names in case of ties(eg 2nd record below)
How can I do it in Base SAS ?
Intended outcome:
Cust_ID | Rank1 | Rank2 | Rank3 | Rank4 | Rank5 |
3589 | MCG_3 | MCG_1 | MCG_4 | MCG_5 | MCG_2 |
1366 | MCG_1/MCG_2 | MCG_4 | MCG_3 | MCG_5 | |
6144 | MCG_2 | MCG_3 | MCG_5 | MCG_4 |
Thanks,
Shalini
I think that the following will do what you want:
data want (drop=MCG_1-MCG_5);
set have;
array _r(*) MCG_1-MCG_5;
array rank(5) $11.;
call missing (of rank(*));
do _n_=1 to dim(_r);
if not missing(_r(_n_)) then do;
if int(_r(_n_)) eq _r(_n_) then rank(_r(_n_))=vname(_r(_n_));
else if missing(rank(int(_r(_n_)))) then rank(int(_r(_n_)))=
vname(_r(int(_r(_n_))));
else rank(int(_r(_n_)))=catx('/',rank(int(_r(_n_))),
vname(_r(_n_)));
end;
end;
run;
Thanks a lot Arthur for your help..
When I increased the attributes from 5 to 11, the code provides incorrect output in case of ties.
For eg:
Cust_ID,MCG_1,MCG_2,MCG_3,MCG_4,MCG_5,MCG_6,MCG_7,MCG_8,MCG_9,MCG_10,MCG_11,
3478,2.5,2.5,2.5,9.5,9.5,8,,2.5,5.5,7,5.5
the code gives below output:
Rank1-(blank), Rank2- MCG_2/MCG_2/MCG_3/MCG_8
Rank3-(blank), Rank4- (blank),
Rank5 - MCG_5/MCG_11, Rank6 - (blank),
Rank7 - MCG_10, Rank8 - MCG_6,
Rank9 -MCG_9/MCG_5 , Rank10 - (blank), Rank11 - (blank)
The required outcome is:
Rank1 - (MCG_1/MCG_2/MCG_3/MCG_8 , Rank2- (blank),
Rank3-(blank), Rank4- (blank),
Rank5 - MCG_9/MCG_11, Rank6 - (blank),
Rank7 - MCG_10, Rank8 - MCG_6,
Rank9 -MCG_4/MCG_5 , Rank10 - (blank), Rank11 - (blank)
What can be done to achieve the required outcome?
Restructure data using PROC TRANSPOSE and then use PROC RANK TIES=LOW to convert your ranks into integers that you can use as array indexes.
data have ;
infile cards dsd truncover ;
input Cust_ID MCG_1 - MCG_11 ;
cards;
3589,2,5,1,3,4,
1366,1.5,1.5,4,3,5,
6144,,1,2,4,3
3478,2.5,2.5,2.5,9.5,9.5,8,.,2.5,5.5,7,5.5
run;
proc transpose data=have out=vertical ;
by cust_id notsorted;
var mcg_: ;
run;
proc rank data=vertical ties=low out=vertical2;
by cust_id notsorted;
var col1;
ranks rank;
run;
data want ;
set vertical2 ;
by cust_id notsorted;
array r $100 rank1-rank11 ;
retain rank1-rank11;
keep cust_id rank1-rank11;
if first.cust_id then call missing(of rank1-rank11);
if rank then r(rank)=catx('/',r(rank),_name_);
if last.cust_id ;
run;
data _null_;
set want ;
put (_all_) (=/);
run;
Cust_ID=3589
rank1=MCG_3
rank2=MCG_1
rank3=MCG_4
rank4=MCG_5
rank5=MCG_2
rank6=
rank7=
rank8=
rank9=
rank10=
rank11=
Cust_ID=1366
rank1=MCG_1/MCG_2
rank2=
rank3=MCG_4
rank4=MCG_3
rank5=MCG_5
rank6=
rank7=
rank8=
rank9=
rank10=
rank11=
Cust_ID=6144
rank1=MCG_2
rank2=MCG_3
rank3=MCG_5
rank4=MCG_4
rank5=
rank6=
rank7=
rank8=
rank9=
rank10=
rank11=
Cust_ID=3478
rank1=MCG_1/MCG_2/MCG_3/MCG_8
rank2=
rank3=
rank4=
rank5=MCG_9/MCG_11
rank6=
rank7=MCG_10
rank8=MCG_6
rank9=MCG_4/MCG_5
rank10=
rank11=
Thanks a lot..Both the codes worked for me
HI..
For the above query,I have a slight modification in the requirement in case of ties.
Sample dataset:
Cust_ID,MCG_1,MCG_2,MCG_3,MCG_4,MCG_5,MCG_6,MCG_7,MCG_8,MCG_9,MCG_10,MCG_11,
Sp_1, Sp_2, Sp_3, Sp_4, Sp_5, Sp_6, Sp_7, Sp_8, Sp_9, Sp_10, Sp_11,
3478,2.5,2.5,2.5,9.5,9.5,8,.,2.5,5.5,7,5.5,1000,2000,3000,4000,5000,6000,7000,8000,9000,10000,11000,
3479,1.5,3.5,3.5,8.5,8.5,10,11,1.5,6,6,6,15000,14000,12500,11000,9000,8000,7500,15000,6000,5000,4000
The 11 attributes MCG_1 to MCG_11 are row wise ranks based on customer RFM scores in 11 merchant category groups(Shopping/Restaurants etc) and sp_1 to sp_11 their respective spends($)
The codes shared by Tom & Arthur helped me create 11 new variables which gave me variable names based on their ranks;and both the variable names in case of ties.
In case of ties,instead of tied variable names separted by '/ ' I now want the variable name of the Max value of corresponding tied spends.
Earlier outcome was:
Cust_ID | rank1 | rank2 | rank3 | rank4 | rank5 | rank6 | rank7 | rank8 | rank9 | rank10 | rank11 |
3478 | MCG_1/MCG_2/MCG_3/MCG_8 | MCG_9/MCG_11 | MCG_10 | MCG_6 | MCG_4/MCG_5 | ||||||
3479 | MCG_1/MCG_8 | MCG_2/MCG_3 | MCG_9/MCG_10/MCG_11 | MCG_4/MCG_5 | MCG_6 | MCG_7 |
Required outcome:
Cust_ID | rank1 | rank2 | rank3 | rank4 | rank5 | rank6 | rank7 | rank8 | rank9 | rank10 | rank11 |
3478 | sp_8 | sp_3 | sp_2 | sp_1 | sp_11 | sp_9 | MCG_10 | MCG_6 | sp_5 | sp_4 | |
3479 | sp_1 | sp_8 | sp_2 | sp_3 | sp_9 | sp_10 | sp_11 | sp_4 | sp_5 | MCG_6 | MCG_7 |
In 2nd record,MCG(1 & 😎 as well as Sp variable is tied for rank 1.Either of the variable name sp_1 or sp_8 will be acceptable in rank1 & rank2 values.
Can this be done in SAS?
The 'earlier outcome' & 'required outcome' tables have 20+ columns & may not appear totally on the screen.Hence,I am attaching a csv file
'Reqd_Outcome.csv' for your reference.
You would have to test the following, but I think it does what you want:
data have;
infile cards dlm=',';
input Cust_ID MCG_1-MCG_11;
cards;
3478,2.5,2.5,2.5,9.5,9.5,8,.,2.5,5.5,7,5.5
3479,1.5,3.5,3.5,8.5,8.5,10,11,1.5,6,6,6
;
data want (drop=MCG: counter: _: i j);
set have;
array r(*) MCG_1-MCG_11;
array rank(11) $70.;
array counter(11);
array _rank(11);
call missing (of rank(*));
call missing (of counter(*));
call missing (of _rank(*));
do i=1 to 11;
if not missing(r(i)) then do;
do j=1 to 11;
if r(i)=r(j) then counter(i)+1;
end;
if counter(i) eq 1 then _rank(i)=r(i);
else _rank(i)=round(r(i))-(int(counter(i)/2));
end;
end;
do i=1 to dim(r);
if not missing(r(i)) then do;
if r(i) eq _rank(i) then rank(r(i))=vname(r(i));
else rank(_rank(i))=catx('/',rank(_rank(i)),vname(r(i)));
end;
end;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.