09-06-2011 03:48 PM
Create a program that takes two words as input, for example:
The program will find linkage words between the two given:
play/table (playtimes, timestable)
hair/ball (hairpin, pinball)
In unix utilize the dictionary file ( /usr/dict/words or /usr/share/dict/words ) for your seed data.
I have a solution, I wouldn't call it efficient but the code is pretty simple. I will share later in order to give everyone a blank starting point.
EDIT: I have attached a gunzipped version of the dictionary file to this thread for the non-unix folks...
09-09-2011 02:56 AM
OK. It is very interesting.
Thank FriedEgg who offer me a dictionary , I reserve it,maybe it will be useful for future.
data dictionary; infile 'c:\unix-words'; input words : $100.; run; %let first=play; %let last=table; data _null_; length key _key word1 word2 $ 100; declare hash ha(hashexp : 20,dataset : 'work.dictionary(rename=(words=key))'); declare hiter hi('ha'); ha.definekey('key'); ha.definedata('key'); ha.definedone(); rc=hi.first(); do while(rc=0); _key=key; word1=cats("&first",_key); key=word1;r1=ha.check(); word2=cats(_key,"&last");key=word2;r2=ha.check(); if r1=0 and r2=0 then do;put 'Found:' word1 word2; found=1;end; rc=hi.next(); end; if not found then put 'Search over. Not Found.'; stop; run;
09-09-2011 02:03 PM
Thanks Ksharp for the solution utilizing hash object. It works well. Here is another solution:
data words(keep=word) word1(keep=word link) word2(keep=word link);
infile '/usr/share/dict/words' truncover;
input word : $45.;
array v $ 45 _temporary_ ("&v1" "&v2");
do i=1 to dim(v);
if index(word,trim(v))>0 then
if count(trim(link),' ')<2 then
array l $ 45 _temporary_;
do j=1 to dim(l);
if length(l)>length(l) then link=l; else link=l;
if i=1 then output word1; else output word2;
create table want as
select a.link ,a.word as word1 ,b.word as word2
from ( select strip(link) as link ,word
where link in ( select word from words ) ) a,
( select strip(link) as link ,word
where link in ( select word from words ) ) b
My example will not produce the same results. In my original examples of output I made it seem like the words should follow a pattern where a/b -> ac cb but it was unitentional and my solution above allows for more linkages to be found. The hash can be modified to meet the same results, I will do it later if I find the time.