Word list search Hi, thanks a lot for the tips! I have tried Proc Spell so far, which seems to work quite nicely, only that it delivers the frequencies of words from the lexicon that do NOT appear in the phrase list, whereas i need the words that do appear. Example: Word file= {"able"; "best";"courage"}. Every word in the list has an associated emotion rating (i am a psychologist) on a scale from 1-7: (able; 5.89); (best; 6.76); (courage; 6.30). Phrase file = {"i feel at my best because i have been working on important goals for the past three days, and i've been able to complete them"; "i feel a bit disappointed that i wasn't as eager about it as everyone else but it was the best book i've read at least"; "i just hope we get to fix our relationship as best friends, it just takes a little bit of courage"} I want to check if each of the words is contained in each phrase, and then assign the word's score to the phrase if found. E.g., best and courage are both present in phrase 3, so the phrase_score for phrase 3 will be equal to 6.76 + 6.30 = 13.06. Of course, it would be nice that if the word is found more than once in the same phrase, the phrase_score is incremented as well, but at moment i am trying to get this to work for one occurence per phrase. Hash looks very interesting, tried it, but i seem to be doing something wrong there, because it delivers an output file with 0 observations. What I have done so far as a test: /* Count number of observations in ref.word_list and ref.phrase_list*/ PROC SQL; SELECT COUNT(*), max(length(word)) INTO: WordNo, :Maxl FROM ref.word_list; select count(*), max(length(phrase)) INTO: PhraseNo, :Maxlg FROM ref.phrase_list; QUIT; /* Load only 15 words from the list of words into a one-dimensional array, for testing purposes*/ data ref.solution; set ref.phrase_list end=eoflist; length word1-word&WordNo $&Maxl; array wordz {&WordNo} word1-word&WordNo; retain word1-word&WordNo; if _n_=1 then do iWord = 1 to 15; set ref.word_list; wordz{iWord} = word; end; /* Look for each word in each phrase*/ score=0; do iWord = 1 to 15; lag(score); MatchW = indexw(phrase,wordz{iWord}); if MatchW > 0 then score+3; end; keep phrase score; run; Dan
... View more