Hello Experts,
I wants to creat a ngarm modal, created 3garm modal and the codes are below.
data test;
sen = "The cow jumps over the moon";
run;
data test3;
set test;
nitems=countw(sen);
length combo $ 100;
if nitems >1;
do i=1 to nitems;
combo = scan(sen,i);
output;
do j=i+1 to nitems;
combo = catx('', scan(sen,i), scan(sen,j));
output;
do k=j+1 to nitems;
combo = catx('', scan(sen,i), scan(sen,j), scan(sen,k));
output;
end;
end;
end;
run;
Thanks in Advance
I don't see a question here But if you are asking about how to correct your program to get all three-word combinations, here are a few small changes:
data test3;
set test;
nitems=countw(sen);
length combo $ 100;
if nitems >2;
do i=1 to nitems-2;
combo = scan(sen,i);
*output;
do j=i+1 to nitems-1;
combo = catx('', scan(sen,i), scan(sen,j));
*output;
do k=j+1 to nitems;
combo = catx('', scan(sen,i), scan(sen,j), scan(sen,k));
output;
end;
end;
end;
run;
I thought ngrams were CONTIGUOUS words, but you are apparently trying to get all word COMBINATIONS (i.e. even if the "ngram" elements are not contiguous). Is that really your intention? (And you say you want "3grams" but you're also outputing single words, and pairs or words. If combinations is what you really want, I'd suggest using the ALLCOMBI function (code untested):
data want (drop=ix_:);
set have;
item_count=countw(sen);
if item_count<3 then delete;
length combo $80;
array items{20} $12 _temporary_;
do I=1 to item_count;
items{I}=scan(sen,I,' ');
end;
array ix3 {*} ix_1-ix_3;
array ix2 {*} ix_1-ix_2;
array ix1 {*} ix_1-ix_1;
do combosize=1 to 3;
ncomb=comb(item_count,combosize);
ix_1=.;
do c=1 to ncomb;
select (combosize);
when (1) call allcombi(item_count,combosize, of ix1{*});
when (2) call allcombi(item_count,combosize, of ix2{*});
when (3) call allcombi(item_count,combosize, of ix3{*});
end;
combo=' ';
do I=1 to combosize;
combo=catx(' ',combo,items{ix3{I}});
end;
output;
end;
end;
run;
But if you only want what it typically defined as ngrams, it's a lot simpler:
data want;
set have;
item_count=countw(sen);
length gram $36;
if item_count>=3 then do g=1 to item_count;
gram=scan(sen,g,' '); /*1-gram*/
output;
if g=item_count then leave;
gram=catx(' ',gram,scan(sen,g+1,' ')); /*bi-gram*/
output;
if g=item_count-1 then leave;
gram=catx(' ',gram,scan(sen,g+2,' '); /*tri-gram*/
output;
end;
run;
The program I suggested, after correcting a typographical error, generates singles, doubles, and triples with no coding logic change. So I do not undertand why you got pairs and not triples (the error I corrected is not related to size of combinations).
You have now changed your requirement from generating triples (a fixed size) to a variable combination size. You can modify the program to accomodate a larger fixed size. You can do combos up to 10 words with:
(1) adding arrays statements for each size up to 10,
(2) change "do combosize=1 to 3" to "do combosize=1 to min(10,item_count)";
(3) add a "when" statement for each additional size
(4) change combo=catx(' ',combo,items{ix3{I}});
to combo=catx(' ',combo,items{ix10{I}});
But in the end, this program is not meant to accomodate ANY size.
A macro-ized version:
data have;
sen='the cow jumps over the moon';
run;
%macro want(max=10);
%local max /*maximum combination size*/
S /*combo size index */ ;
data want (drop=ix_:);
set have;
item_count=countw(sen);
if item_count<3 then delete;
length combo $%eval(100+&max*13);
array items{&max} $12 _temporary_;
do I=1 to min(item_count,&max);
items{I}=scan(sen,I,' ');
end;
%do S=1 %to &max;
array ix&S {*} ix_1-ix_&S ;
%end;
do combosize=1 to item_count;
ncomb=comb(item_count,combosize);
ix_1=.;
do c=1 to ncomb;
select (combosize);
%do s=1 %to &max ;
when(&S) call allcombi(item_count,combosize,of ix&S{*});
%end;
end;
combo=' ';
do I=1 to combosize;
combo=catx(' ',combo,items{ix&max{I}});
end;
output;
end;
end;
run;
%mend;
%want(max=15);
As I said earlier, the program is not meant for a variable combination size, which is why it now has to be macro-ized.
Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.
Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.