Greetings,
I am trying to create all possible permutations of last name. Sample data below
Data One_;
Input Id $2. name $30.;
cards;
01 Sun Set
02 Road-Track
03 Peppermint
04 Talenti Cherry Gelato
05 Very Berry Talenti Gelato Test
06 O'Malley
07 Test,Test
;
run;
The number of words in last name is not constant. Below is my code to create the permutation.
Data two_; set one_;
nwords=countw(name, " -");
run;
proc sql noprint;
select max(nwords) into :max_words
from two; quit; %put &max_words;
%macro obsn;
%global nobs;
%let dsid = %sysfunc(open(two));
%if &dsid %then %do;
%let nobs =%sysfunc(attrn(&dsid,nobs));
%let rc = %sysfunc(close(&dsid));
%end; %mend obsn; %obsn; %put &nobs;
data Two (drop=i);
set Two;
array zName {&max_words.} $30;
do i=1 to &max_words.;
zName{i}=scan(name, i,' -');
end; run;
data Out1; set two;
do i=1 to &nobs;
do k=1 to nwords;
nperm=perm(nwords,k);
if i=_n_ then output;
end; end; run;
data Out2; set Out1;
array zName {&max_words.} $30;
do j=1 to nperm;
rc=lexperk(j,k, of zName(*));
if rc<0 then delete;
output; end; run;
I am stuck after this. Based on Out2 data I need to create the new last name varaible using CATX function. I need to concatnate the zname(n) in data Out2 varaibels based on the values in Varaible K. For e.g if K=1 then LN=catx(zname1), K=4 then LN=catx(of zname1-zname4). So basically zname(n) should be based on the K value for each records.
Thanks in advance.
RSVS
If you can provide an example of the desired output for your example data it will help. If you think your example generates too many case to do by hand, use a smaller example data set.
I have a suspicion that you may be making this more complicated than needed.
I think so too. I tried ALLCOMB too but I may be missing something and is not working since the number of variables created from the old name variable is not constant. Meaning the old name may be 'Tom Smith' or 'Tom' or 'Tom Smith Part'. So below is the sample data
Data Have:
ID Name
----------------
01 Tom Smith
02 Tom
03 Tom Smith Part
Data Want:
ID Name z Name
---------------------------------------------------
01 Tom Smith Tom
01 Tom Smith Smith
01 Tom Smith Tom Smith
01 Tom Smith Smith Tom
02 Pam Pam
03 Tim Smith Part Tim
03 Tim Smith Part Smith
03 Tim Smith Part Part
03 Tim Smith Part Tim Smith
03 Tim Smith Part Tim Part
03 Tim Smith Part Smith Part
03 Tim Smith Part Smith Tim
03 Tim Smith Part Part Tim
03 Tim Smith Part Part Smith
Thanks.
Why no Tim Smith Part, Part Smith Tim and Part Tim Smith for z_name in the output example? Is there a rule to only consider 1 and 2 "Names"?
Sorry forgot to include
Tim Smith Part, Part Smith Tim and Part Tim Smith
Thanks.
Here's one approach
/* example assuming all of the "names" are 10 characters or less.
if not so increase 10 to appropriate size */
data want;
set have;
length t1 - t5 e1 - e5 $ 10;
array t t1 - t5 ;
array e e1 - e5 ; /* this holds the result*/
wordcount = countw(name);
/* assign each word to a variable, this is a base list of names words as we
will need to process it 1, 2, 3, 4 or 5 words at a time*/
do i = 1 to wordcount;
t[i] = scan(name,i);
end;
/* iterate through 1 , 2, 3, 4 or 5 word permutations*/
do words = 1 to wordcount;
nperm = perm(wordcount,words); /* number of permutations of words */
do j = 1 to nperm;
/* reset the result variables*/
call missing (of e[*]);
/* select a permutation*/
call lexperk(j,words, of t[*]);
/* put words + 2 J +2 t[*];*/
/* copy the result into the result holding variables
the first WORDS number of values are the ones to keep*/
do m=1 to words;
e[m]=t[m];
end;
output;
end;
end;
drop t: wordcount words i j m nperm;
run;
Change the Have dataset to yours. The variables E1 through E5 could be combined using one of the CAT functions if needed. Note, permutations of N,K can get very large quickly.
Thanks for the solution. It works wonderfully.
It is a bit more complicated.
How many words are there for an obs ?
I assume there are at most three words in an obs .
data have;
input ID $ Name & $40.;
cards;
01 Tom Smith
02 Tom
03 Tom Smith Part
;
run;
data want;
set have;
array x{99} $ 32 _TEMPORARY_;
length zName $ 100;
call missing(of x{*});
n=countw(Name);
do i=1 to n;
x{i}=scan(name,i);
end;
do i=1 to n;
if i eq 1 then do;
do i1=1 to n;
zName=x{i1};output;
end;
end;
if i eq 2 then do;
do i1=1 to n;
do i2=1 to n;
if i1 ne i2 then do;zName=catx(' ',x{i1},x{i2});output;end;
end;
end;
end;
if i eq 3 then do;
do i1=1 to n;
do i2=1 to n;
do i3=1 to n;
if i1 ne i2 and i1 ne i3 and i2 ne i3 then do;zName=catx(' ',x{i1},x{i2},x{i3});output;end;
end;
end;
end;
end;
end;
if n gt 3 then putlog "ERROR: There are more than three words.";
drop n i i1-i3;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.