DATA Step, Macro, Functions and more

Function that outputs attribute name

Reply
Occasional Contributor
Posts: 5

Function that outputs attribute name

Hi All,

My data set has following fields

Customer_ID(Primary key)  MCG_1  MCG_2  MCG_3   MCG_4   MCG_5

The 5 attributes MCG_1 to MCG_5 are row wise ranks based on customer spends in 5 merchant category groups(Shopping/Restaurants etc) also

containing ties and blanks.


Sample dataset:


Cust_ID,MCG_1,MCG_2,MCG_3,MCG_4,MCG_5,

3589,2,5,1,3,4,

1366,1.5,1.5,4,3,5,

6144,,1,2,4,3

I need to create 5 variables in the dataset which will give me variable names based on their ranks;and both the variable names in case of ties(eg 2nd record below)

How can I do it in Base SAS ?

Intended outcome:

Cust_IDRank1Rank2Rank3Rank4Rank5
3589MCG_3MCG_1MCG_4MCG_5MCG_2
1366MCG_1/MCG_2 MCG_4MCG_3MCG_5
6144MCG_2MCG_3MCG_5MCG_4



Thanks,

Shalini

PROC Star
Posts: 7,492

Re: Function that outputs attribute name

Posted in reply to Shalini_K

I think that the following will do what you want:

data want (drop=MCG_1-MCG_5);

  set have;

  array _r(*) MCG_1-MCG_5;

  array rank(5) $11.;

  call missing (of rank(*));

  do _n_=1 to dim(_r);

    if not missing(_r(_n_)) then do;

      if int(_r(_n_)) eq _r(_n_) then rank(_r(_n_))=vname(_r(_n_));

      else if missing(rank(int(_r(_n_)))) then rank(int(_r(_n_)))=

       vname(_r(int(_r(_n_))));

      else  rank(int(_r(_n_)))=catx('/',rank(int(_r(_n_))),

       vname(_r(_n_)));

    end;

  end;

run;

Occasional Contributor
Posts: 5

Re: Function that outputs attribute name

Thanks a lot Arthur for your help..

When I increased the attributes from 5 to 11, the code provides incorrect output in case of ties.

For eg:

Cust_ID,MCG_1,MCG_2,MCG_3,MCG_4,MCG_5,MCG_6,MCG_7,MCG_8,MCG_9,MCG_10,MCG_11,

3478,2.5,2.5,2.5,9.5,9.5,8,,2.5,5.5,7,5.5

the code gives below output:

Rank1-(blank), Rank2- MCG_2/MCG_2/MCG_3/MCG_8

Rank3-(blank), Rank4- (blank),

Rank5 - MCG_5/MCG_11, Rank6 - (blank),

Rank7 - MCG_10,  Rank8 - MCG_6,

Rank9 -MCG_9/MCG_5 , Rank10 - (blank),  Rank11 - (blank)

The required outcome is:

Rank1 - (MCG_1/MCG_2/MCG_3/MCG_8 , Rank2- (blank),

Rank3-(blank), Rank4- (blank),

Rank5 - MCG_9/MCG_11, Rank6 - (blank),

Rank7 - MCG_10,  Rank8 - MCG_6, 

Rank9 -MCG_4/MCG_5 , Rank10 - (blank),  Rank11 - (blank)

What can be done to achieve the required outcome?

Super User
Super User
Posts: 7,079

Re: Function that outputs attribute name

Posted in reply to Shalini_K

Restructure data using PROC TRANSPOSE and then use PROC RANK TIES=LOW to convert your ranks into integers that you can use as array indexes.

data have ;

  infile cards dsd truncover ;

  input Cust_ID MCG_1 - MCG_11 ;

cards;

3589,2,5,1,3,4,

1366,1.5,1.5,4,3,5,

6144,,1,2,4,3

3478,2.5,2.5,2.5,9.5,9.5,8,.,2.5,5.5,7,5.5

run;

proc transpose data=have out=vertical ;

  by cust_id notsorted;

  var mcg_: ;

run;

proc rank data=vertical ties=low out=vertical2;

  by cust_id notsorted;

  var col1;

  ranks rank;

run;

data want ;

  set vertical2 ;

  by cust_id notsorted;

  array r $100 rank1-rank11 ;

  retain rank1-rank11;

  keep cust_id rank1-rank11;

  if first.cust_id then call missing(of rank1-rank11);

  if rank then r(rank)=catx('/',r(rank),_name_);

  if last.cust_id ;

run;

data _null_;

  set want ;

  put (_all_) (=/);

run;

Cust_ID=3589

rank1=MCG_3

rank2=MCG_1

rank3=MCG_4

rank4=MCG_5

rank5=MCG_2

rank6=

rank7=

rank8=

rank9=

rank10=

rank11=

Cust_ID=1366

rank1=MCG_1/MCG_2

rank2=

rank3=MCG_4

rank4=MCG_3

rank5=MCG_5

rank6=

rank7=

rank8=

rank9=

rank10=

rank11=

Cust_ID=6144

rank1=MCG_2

rank2=MCG_3

rank3=MCG_5

rank4=MCG_4

rank5=

rank6=

rank7=

rank8=

rank9=

rank10=

rank11=

Cust_ID=3478

rank1=MCG_1/MCG_2/MCG_3/MCG_8

rank2=

rank3=

rank4=

rank5=MCG_9/MCG_11

rank6=

rank7=MCG_10

rank8=MCG_6

rank9=MCG_4/MCG_5

rank10=

rank11=

Occasional Contributor
Posts: 5

Re: Function that outputs attribute name

Thanks a lot..Both the codes worked for me

Occasional Contributor
Posts: 5

Re: Function that outputs attribute name

HI..

For the above query,I have a slight modification in the requirement in case of ties.

Sample dataset:

Cust_ID,MCG_1,MCG_2,MCG_3,MCG_4,MCG_5,MCG_6,MCG_7,MCG_8,MCG_9,MCG_10,MCG_11,

Sp_1, Sp_2, Sp_3, Sp_4, Sp_5, Sp_6, Sp_7, Sp_8, Sp_9, Sp_10, Sp_11,

3478,2.5,2.5,2.5,9.5,9.5,8,.,2.5,5.5,7,5.5,1000,2000,3000,4000,5000,6000,7000,8000,9000,10000,11000,

3479,1.5,3.5,3.5,8.5,8.5,10,11,1.5,6,6,6,15000,14000,12500,11000,9000,8000,7500,15000,6000,5000,4000


The 11 attributes MCG_1 to MCG_11 are row wise ranks based on customer RFM scores in 11 merchant category groups(Shopping/Restaurants etc) and sp_1 to sp_11 their respective spends($)

The codes shared by Tom & Arthur helped me create 11 new variables which gave me variable names based on their ranks;and both the variable names in case of ties.

In case of ties,instead of  tied variable names separted by '/ ' I now want the variable name of the Max value of corresponding tied spends.

Earlier outcome was:

Cust_IDrank1rank2rank3rank4rank5rank6rank7rank8rank9rank10rank11
3478MCG_1/MCG_2/MCG_3/MCG_8MCG_9/MCG_11MCG_10MCG_6MCG_4/MCG_5
3479MCG_1/MCG_8MCG_2/MCG_3MCG_9/MCG_10/MCG_11MCG_4/MCG_5MCG_6MCG_7


Required outcome:

Cust_IDrank1rank2rank3rank4rank5rank6rank7rank8rank9rank10rank11
3478sp_8sp_3sp_2sp_1sp_11sp_9MCG_10MCG_6sp_5sp_4
3479sp_1sp_8sp_2sp_3sp_9sp_10sp_11sp_4sp_5MCG_6MCG_7


In 2nd record,MCG(1 & 8) as well as Sp variable is tied for rank 1.Either of the variable name sp_1 or sp_8 will be acceptable in rank1 & rank2 values.

Can this be done in SAS?




Occasional Contributor
Posts: 5

Re: Function that outputs attribute name

Posted in reply to Shalini_K

The 'earlier outcome' & 'required outcome' tables have 20+ columns & may not appear totally on the screen.Hence,I am attaching a csv file

'Reqd_Outcome.csv' for your reference.

Attachment
PROC Star
Posts: 7,492

Re: Function that outputs attribute name

Posted in reply to Shalini_K

You would have to test the following, but I think it does what you want:

data have;

  infile cards dlm=',';

  input Cust_ID MCG_1-MCG_11;

  cards;

3478,2.5,2.5,2.5,9.5,9.5,8,.,2.5,5.5,7,5.5

3479,1.5,3.5,3.5,8.5,8.5,10,11,1.5,6,6,6

;

data want (drop=MCG: counter: _: i j);

  set have;

  array r(*) MCG_1-MCG_11;

  array rank(11) $70.;

  array counter(11);

  array _rank(11);

  call missing (of rank(*));

  call missing (of counter(*));

  call missing (of _rank(*));

  do i=1 to 11;

    if not missing(r(i)) then do;

      do j=1 to 11;

        if r(i)=r(j) then counter(i)+1;

      end;

      if counter(i) eq 1 then _rank(i)=r(i);

      else _rank(i)=round(r(i))-(int(counter(i)/2));

    end;

  end;

  do i=1 to dim(r);

    if not missing(r(i)) then do;

      if r(i) eq _rank(i) then rank(r(i))=vname(r(i));

      else  rank(_rank(i))=catx('/',rank(_rank(i)),vname(r(i)));

    end;

  end;

run;

Ask a Question
Discussion stats
  • 7 replies
  • 450 views
  • 6 likes
  • 3 in conversation