DATA Step, Macro, Functions and more

Permutation and concatnation

Reply
Contributor
Posts: 38

Permutation and concatnation

Greetings,

 

I am trying to create all possible permutations of last name. Sample data below

 

Data One_;

Input Id $2. name $30.;

cards;

01 Sun Set

02 Road-Track

03 Peppermint

04 Talenti Cherry Gelato

05 Very Berry Talenti Gelato Test

06 O'Malley

07 Test,Test

;

run;

 

The number of words in last name is not constant. Below is my code to create the permutation.

 

Data two_; set one_;

nwords=countw(name, " -");

run;

 

proc sql noprint;

select max(nwords) into :max_words

from two; quit; %put &max_words;

 

%macro obsn;

%global nobs;

%let dsid = %sysfunc(open(two));

%if &dsid %then %do;

%let nobs =%sysfunc(attrn(&dsid,nobs));

%let rc = %sysfunc(close(&dsid));

%end; %mend obsn; %obsn; %put &nobs;

 

data Two (drop=i);

set Two;

array zName {&max_words.} $30;

do i=1 to &max_words.;

zName{i}=scan(name, i,' -');

end; run;

 

data Out1; set two;

do i=1 to &nobs;

do k=1 to nwords;

nperm=perm(nwords,k);

if i=_n_ then output;

end; end; run;

 

data Out2; set Out1;

array zName {&max_words.} $30;

do j=1 to nperm;

rc=lexperk(j,k, of zName(*));

if rc<0 then delete;

output; end; run;

 

I am stuck after this. Based on Out2 data I need to create the new last name varaible using CATX function. I need to concatnate the zname(n) in data Out2 varaibels based on the values in Varaible K. For e.g if K=1 then LN=catx(zname1), K=4 then LN=catx(of zname1-zname4). So basically zname(n) should be based on the K value for each records.

 

Thanks in advance.

RSVS

 

Super User
Posts: 10,497

Re: Permutation and concatnation

If you can provide an example of the desired output for your example data it will help. If you think your example generates too  many case to do by hand, use a smaller example data set.

 

I have a suspicion that you may be making this more complicated than needed.

Contributor
Posts: 38

Re: Permutation and concatnation

I think so too. I tried ALLCOMB too but I may be missing something and is not working since the number of variables created from the old name variable is not constant. Meaning the old name may be 'Tom Smith' or 'Tom' or 'Tom Smith Part'. So below is the sample data

 

Data Have:

 

ID    Name

----------------

01   Tom Smith

02   Tom

03   Tom Smith Part

 

Data Want:

 

ID    Name                       z Name

---------------------------------------------------

01   Tom Smith                Tom

01   Tom Smith                Smith

01   Tom Smith                Tom Smith

01   Tom Smith                 Smith Tom

02   Pam                           Pam

03   Tim Smith Part          Tim

03   Tim Smith Part          Smith

03   Tim Smith Part          Part

03   Tim Smith Part          Tim Smith

03   Tim Smith Part          Tim Part

03   Tim Smith Part          Smith Part

03   Tim Smith Part          Smith Tim

03   Tim Smith Part          Part Tim

03   Tim Smith Part           Part Smith

 

Thanks.

Super User
Posts: 10,497

Re: Permutation and concatnation

Why no Tim Smith Part,  Part Smith Tim and Part Tim Smith for z_name in the output example? Is there a rule to only consider 1 and 2 "Names"?

Contributor
Posts: 38

Re: Permutation and concatnation

Sorry forgot to include

 

Tim Smith Part,  Part Smith Tim and Part Tim Smith

 

Thanks.

Super User
Posts: 10,497

Re: Permutation and concatnation

Here's one approach

/* example assuming all of the "names" are 10 characters or less. 
if not so increase 10 to appropriate size */
data want;
   set have;
   length t1 - t5 e1 - e5 $ 10;
   array t t1 - t5 ;
   array e e1 - e5 ; /* this holds the result*/
   wordcount = countw(name);
   /* assign each word to a variable, this is a base list of names words as we
   will need to process it 1, 2, 3, 4 or 5 words at a time*/
   do i = 1 to wordcount;
      t[i] = scan(name,i);
   end;
   /* iterate through 1 , 2, 3, 4 or 5 word permutations*/
   do words = 1 to wordcount;
      nperm = perm(wordcount,words); /* number of permutations of words */
      do j = 1 to nperm; 
         /* reset the result variables*/
         call missing (of e[*]);
         /* select a permutation*/
         call lexperk(j,words, of t[*]);
/*         put words + 2 J +2 t[*];*/
         /* copy the result into the result holding variables
            the first WORDS number of values are the ones to keep*/
         do m=1 to words;
            e[m]=t[m];
         end;
         output;
      end; 
   end; 
   drop t: wordcount words i j m nperm;
run;

Change the Have dataset to yours. The variables E1 through E5 could be combined using one of the CAT functions if needed. Note, permutations of N,K can get very large quickly.

 

Contributor
Posts: 38

Re: Permutation and concatnation

Thanks for the solution. It works wonderfully.

Super User
Posts: 17,819

Re: Permutation and concatnation

@rsva please mark the question answered, if a solution meets your needs.
Super User
Posts: 17,819

Re: Permutation and concatnation

Look at the call allcomb() routine. The documentation has examples of generating all possible combinations.
Super User
Posts: 9,676

Re: Permutation and concatnation

It is a bit more complicated.

How many words are there for an obs ?

I assume there are at most three words in an obs .

 

 

data have;
input ID $   Name & $40.;
cards;
01   Tom Smith
02   Tom
03   Tom Smith Part
;
run;
data want;
 set have;
 array x{99} $ 32 _TEMPORARY_;
 length zName $ 100;
 call missing(of x{*});
 n=countw(Name);
 
 do i=1 to n;
  x{i}=scan(name,i);
 end;


do i=1 to n;

if i eq 1 then do;
 do i1=1 to n;
  zName=x{i1};output;
 end;
end;

if i eq 2 then do; 
 do i1=1 to n;
  do i2=1 to n;
   if i1 ne i2 then do;zName=catx(' ',x{i1},x{i2});output;end;
  end;
 end;
end;

if i eq 3 then do; 
 do i1=1 to n;
  do i2=1 to n;
   do i3=1 to n;
   if i1 ne i2 and i1 ne i3 and i2 ne i3 then do;zName=catx(' ',x{i1},x{i2},x{i3});output;end;
  end;
 end;
end;
end;

end;

if n gt 3 then putlog "ERROR: There are more than three words.";
drop n i i1-i3;
run;
Ask a Question
Discussion stats
  • 9 replies
  • 295 views
  • 1 like
  • 4 in conversation