I am trying to create 2 letter bigram using arrays and then multiple steps (proc transpose and STDZ) to arrive the desired result.but i would like to achieve the same if there is way in minimizing all these steps and use just arrays.
I heard that this can be do able in e-miner/text mining module but is there a better a way to achieve through Base/Macros efficiently?
what I have tried is here
data test;
input x$1-14 ;
datalines;
test one
test two
test three
;
run;
data bigram( drop=i);
set test;
n+1;
do i=1 to lengthn(x)-1;
v=substr(x,i,2);output;
end;
run;
x | n | v |
test one | 1 | te |
test one | 1 | es |
test one | 1 | st |
test one | 1 | t |
test one | 1 | o |
test one | 1 | on |
test one | 1 | ne |
test two | 2 | te |
test two | 2 | es |
test two | 2 | st |
test two | 2 | t |
test two | 2 | t |
test two | 2 | tw |
test two | 2 | wo |
test three | 3 | te |
test three | 3 | es |
test three | 3 | st |
test three | 3 | t |
test three | 3 | t |
test three | 3 | th |
test three | 3 | hr |
test three | 3 | re |
test three | 3 | ee |
Expecting this way to minimize number of intermediate steps and computationally efficient when deal with huge number of observations.
Taking two letter unique bi grams from three rows and occurrence of that bigram in a given string coded as 1 else 0
Desired result
x te es st t o on ne tw wo th hr re ee
test one 1 1 1 1 1 1 1 0 0 0 0 0 0
test two 1 1 1 1 0 0 0 1 1 0 0 0 0
test three 1 1 1 1 0 0 0 0 0 1 1 1 1
I don't think arrays will work here any more efficiently because you have the variable names as the ngrams. If they were part of the data then yes an array could work.
One other possible method:
There are (26 choose 2=325) possible combinations + all single values (26) = 351 combinations. Create all and then as you find each, change the indicator variable to a 1/0. But if your data is smaller it may be overkill here to have 351 variables.
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.
Find more tutorials on the SAS Users YouTube channel.