Hello Experts,
I wants to creat a ngarm modal, created 3garm modal and the codes are below.
data test;
sen = "The cow jumps over the moon";
run;
data test3;
set test;
nitems=countw(sen);
length combo $ 100;
if nitems >1;
do i=1 to nitems;
combo = scan(sen,i);
output;
do j=i+1 to nitems;
combo = catx('', scan(sen,i), scan(sen,j));
output;
do k=j+1 to nitems;
combo = catx('', scan(sen,i), scan(sen,j), scan(sen,k));
output;
end;
end;
end;
run;
Thanks in Advance
I don't see a question here But if you are asking about how to correct your program to get all three-word combinations, here are a few small changes:
data test3;
set test;
nitems=countw(sen);
length combo $ 100;
if nitems >2;
do i=1 to nitems-2;
combo = scan(sen,i);
*output;
do j=i+1 to nitems-1;
combo = catx('', scan(sen,i), scan(sen,j));
*output;
do k=j+1 to nitems;
combo = catx('', scan(sen,i), scan(sen,j), scan(sen,k));
output;
end;
end;
end;
run;
I thought ngrams were CONTIGUOUS words, but you are apparently trying to get all word COMBINATIONS (i.e. even if the "ngram" elements are not contiguous). Is that really your intention? (And you say you want "3grams" but you're also outputing single words, and pairs or words. If combinations is what you really want, I'd suggest using the ALLCOMBI function (code untested):
data want (drop=ix_:);
set have;
item_count=countw(sen);
if item_count<3 then delete;
length combo $80;
array items{20} $12 _temporary_;
do I=1 to item_count;
items{I}=scan(sen,I,' ');
end;
array ix3 {*} ix_1-ix_3;
array ix2 {*} ix_1-ix_2;
array ix1 {*} ix_1-ix_1;
do combosize=1 to 3;
ncomb=comb(item_count,combosize);
ix_1=.;
do c=1 to ncomb;
select (combosize);
when (1) call allcombi(item_count,combosize, of ix1{*});
when (2) call allcombi(item_count,combosize, of ix2{*});
when (3) call allcombi(item_count,combosize, of ix3{*});
end;
combo=' ';
do I=1 to combosize;
combo=catx(' ',combo,items{ix3{I}});
end;
output;
end;
end;
run;
But if you only want what it typically defined as ngrams, it's a lot simpler:
data want;
set have;
item_count=countw(sen);
length gram $36;
if item_count>=3 then do g=1 to item_count;
gram=scan(sen,g,' '); /*1-gram*/
output;
if g=item_count then leave;
gram=catx(' ',gram,scan(sen,g+1,' ')); /*bi-gram*/
output;
if g=item_count-1 then leave;
gram=catx(' ',gram,scan(sen,g+2,' '); /*tri-gram*/
output;
end;
run;
The program I suggested, after correcting a typographical error, generates singles, doubles, and triples with no coding logic change. So I do not undertand why you got pairs and not triples (the error I corrected is not related to size of combinations).
You have now changed your requirement from generating triples (a fixed size) to a variable combination size. You can modify the program to accomodate a larger fixed size. You can do combos up to 10 words with:
(1) adding arrays statements for each size up to 10,
(2) change "do combosize=1 to 3" to "do combosize=1 to min(10,item_count)";
(3) add a "when" statement for each additional size
(4) change combo=catx(' ',combo,items{ix3{I}});
to combo=catx(' ',combo,items{ix10{I}});
But in the end, this program is not meant to accomodate ANY size.
A macro-ized version:
data have;
sen='the cow jumps over the moon';
run;
%macro want(max=10);
%local max /*maximum combination size*/
S /*combo size index */ ;
data want (drop=ix_:);
set have;
item_count=countw(sen);
if item_count<3 then delete;
length combo $%eval(100+&max*13);
array items{&max} $12 _temporary_;
do I=1 to min(item_count,&max);
items{I}=scan(sen,I,' ');
end;
%do S=1 %to &max;
array ix&S {*} ix_1-ix_&S ;
%end;
do combosize=1 to item_count;
ncomb=comb(item_count,combosize);
ix_1=.;
do c=1 to ncomb;
select (combosize);
%do s=1 %to &max ;
when(&S) call allcombi(item_count,combosize,of ix&S{*});
%end;
end;
combo=' ';
do I=1 to combosize;
combo=catx(' ',combo,items{ix&max{I}});
end;
output;
end;
end;
run;
%mend;
%want(max=15);
As I said earlier, the program is not meant for a variable combination size, which is why it now has to be macro-ized.
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.