BookmarkSubscribeRSS Feed
rsva
Fluorite | Level 6

Greetings,

 

I am trying to create all possible permutations of last name. Sample data below

 

Data One_;

Input Id $2. name $30.;

cards;

01 Sun Set

02 Road-Track

03 Peppermint

04 Talenti Cherry Gelato

05 Very Berry Talenti Gelato Test

06 O'Malley

07 Test,Test

;

run;

 

The number of words in last name is not constant. Below is my code to create the permutation.

 

Data two_; set one_;

nwords=countw(name, " -");

run;

 

proc sql noprint;

select max(nwords) into :max_words

from two; quit; %put &max_words;

 

%macro obsn;

%global nobs;

%let dsid = %sysfunc(open(two));

%if &dsid %then %do;

%let nobs =%sysfunc(attrn(&dsid,nobs));

%let rc = %sysfunc(close(&dsid));

%end; %mend obsn; %obsn; %put &nobs;

 

data Two (drop=i);

set Two;

array zName {&max_words.} $30;

do i=1 to &max_words.;

zName{i}=scan(name, i,' -');

end; run;

 

data Out1; set two;

do i=1 to &nobs;

do k=1 to nwords;

nperm=perm(nwords,k);

if i=_n_ then output;

end; end; run;

 

data Out2; set Out1;

array zName {&max_words.} $30;

do j=1 to nperm;

rc=lexperk(j,k, of zName(*));

if rc<0 then delete;

output; end; run;

 

I am stuck after this. Based on Out2 data I need to create the new last name varaible using CATX function. I need to concatnate the zname(n) in data Out2 varaibels based on the values in Varaible K. For e.g if K=1 then LN=catx(zname1), K=4 then LN=catx(of zname1-zname4). So basically zname(n) should be based on the K value for each records.

 

Thanks in advance.

RSVS

 

9 REPLIES 9
ballardw
Super User

If you can provide an example of the desired output for your example data it will help. If you think your example generates too  many case to do by hand, use a smaller example data set.

 

I have a suspicion that you may be making this more complicated than needed.

rsva
Fluorite | Level 6

I think so too. I tried ALLCOMB too but I may be missing something and is not working since the number of variables created from the old name variable is not constant. Meaning the old name may be 'Tom Smith' or 'Tom' or 'Tom Smith Part'. So below is the sample data

 

Data Have:

 

ID    Name

----------------

01   Tom Smith

02   Tom

03   Tom Smith Part

 

Data Want:

 

ID    Name                       z Name

---------------------------------------------------

01   Tom Smith                Tom

01   Tom Smith                Smith

01   Tom Smith                Tom Smith

01   Tom Smith                 Smith Tom

02   Pam                           Pam

03   Tim Smith Part          Tim

03   Tim Smith Part          Smith

03   Tim Smith Part          Part

03   Tim Smith Part          Tim Smith

03   Tim Smith Part          Tim Part

03   Tim Smith Part          Smith Part

03   Tim Smith Part          Smith Tim

03   Tim Smith Part          Part Tim

03   Tim Smith Part           Part Smith

 

Thanks.

ballardw
Super User

Why no Tim Smith Part,  Part Smith Tim and Part Tim Smith for z_name in the output example? Is there a rule to only consider 1 and 2 "Names"?

rsva
Fluorite | Level 6

Sorry forgot to include

 

Tim Smith Part,  Part Smith Tim and Part Tim Smith

 

Thanks.

ballardw
Super User

Here's one approach

/* example assuming all of the "names" are 10 characters or less. 
if not so increase 10 to appropriate size */
data want;
   set have;
   length t1 - t5 e1 - e5 $ 10;
   array t t1 - t5 ;
   array e e1 - e5 ; /* this holds the result*/
   wordcount = countw(name);
   /* assign each word to a variable, this is a base list of names words as we
   will need to process it 1, 2, 3, 4 or 5 words at a time*/
   do i = 1 to wordcount;
      t[i] = scan(name,i);
   end;
   /* iterate through 1 , 2, 3, 4 or 5 word permutations*/
   do words = 1 to wordcount;
      nperm = perm(wordcount,words); /* number of permutations of words */
      do j = 1 to nperm; 
         /* reset the result variables*/
         call missing (of e[*]);
         /* select a permutation*/
         call lexperk(j,words, of t[*]);
/*         put words + 2 J +2 t[*];*/
         /* copy the result into the result holding variables
            the first WORDS number of values are the ones to keep*/
         do m=1 to words;
            e[m]=t[m];
         end;
         output;
      end; 
   end; 
   drop t: wordcount words i j m nperm;
run;

Change the Have dataset to yours. The variables E1 through E5 could be combined using one of the CAT functions if needed. Note, permutations of N,K can get very large quickly.

 

rsva
Fluorite | Level 6

Thanks for the solution. It works wonderfully.

Reeza
Super User
@rsva please mark the question answered, if a solution meets your needs.
Reeza
Super User
Look at the call allcomb() routine. The documentation has examples of generating all possible combinations.
Ksharp
Super User

It is a bit more complicated.

How many words are there for an obs ?

I assume there are at most three words in an obs .

 

 

data have;
input ID $   Name & $40.;
cards;
01   Tom Smith
02   Tom
03   Tom Smith Part
;
run;
data want;
 set have;
 array x{99} $ 32 _TEMPORARY_;
 length zName $ 100;
 call missing(of x{*});
 n=countw(Name);
 
 do i=1 to n;
  x{i}=scan(name,i);
 end;


do i=1 to n;

if i eq 1 then do;
 do i1=1 to n;
  zName=x{i1};output;
 end;
end;

if i eq 2 then do; 
 do i1=1 to n;
  do i2=1 to n;
   if i1 ne i2 then do;zName=catx(' ',x{i1},x{i2});output;end;
  end;
 end;
end;

if i eq 3 then do; 
 do i1=1 to n;
  do i2=1 to n;
   do i3=1 to n;
   if i1 ne i2 and i1 ne i3 and i2 ne i3 then do;zName=catx(' ',x{i1},x{i2},x{i3});output;end;
  end;
 end;
end;
end;

end;

if n gt 3 then putlog "ERROR: There are more than three words.";
drop n i i1-i3;
run;

sas-innovate-2024.png

Today is the last day to save with the early bird rate! Register today for just $695 - $100 off the standard rate.

 

Plus, pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 1167 views
  • 1 like
  • 4 in conversation