About akvinay7

akvinay7 · ‎10-08-2018

Thank you so much @FreelanceReinh for your thorough investigation and welcome! I had tried your code out, and I was getting the duplicate observation like you mentioned, which had me baffled. This made me believe the culprit was something to with incorrect variable lengths. I was reading my input from datasets, which I believe may have had those non-breakable spaces like you mentioned. I do not know what kind of encoding they have used, but the data was 'messy' to say the least. I identified the 'messiness' using the hexadecimal conversion idea like you described, and as you expected, there were many non '20' entries within the phrases to be matched. I have created a script using your code to convert those to wrong values to '20' - or essentially 'clean' the dataset. It took some time to clean the dataset, as there were a lot of different types of wrong codes in between. I also found out that there were unnecessary trailing and ending spaces. So, I used the TRIM() function. The proc sql code was re-written below: proc sql; create table test as select a.* from table_one as a, table_two as b where find(a.Sentence, TRIM(b.Text))>0; quit; This gave me all the right values with duplicated observations when applicable. I am "accepting" this solution as the right solution as this helped me identify, de-bug and fix the errors. Thank you so much once again for all of your help! Very happy to be apart of this high quality community of SAS users!

akvinay7 · ‎10-08-2018

Hello, I have the following objective that I want to accomplish: I have two tables. One of the tables (Table_one) has a column called 'Sentence'. It has the values as follows below: SENTENCE I live in New York A bad day A very good day I have another table (Table_two) with a column called 'Text' in the form: TEXT New York good day very good day I want to match phrases in 'Text' to sentences in 'Sentences' to see if they are contained in any of the 'sentences' observations. I want to output those sentences that do contain the text. I understand that this s not difficult in and of itself, but I have a unique case that I could not find much info online. What I want is a table that results in: MATCH I live in New York A very good day A very good day I've tried the following code: proc sql; create table match as select a.* from table_one as a, table_two as b where find(a.Sentence, b.Text)>0 ; run; What I get is the below result: MATCH I live in New York A very good day In other words, since the observations in Table_two: 'good day' & 'very good day' both are contained in the sentence of Table_one: 'A very good day', it is treated like a single observation and returned only once in the output. I however would like both of the phrases to be treated like individual observations and be output twice like my desired output. I have tried both the FIND() and INDEX() functions. But both give me the same results. Is there anyway to avoid the single observation output and get two separate observations even if there are phrases in the same sentence? Any help would be greatly appreciated.

Online Status	Offline
Date Last Visited	‎10-19-2018 04:14 PM

Re: Matching text to sentences - Outputting same Matches Separately in...

Matching text to sentences - Outputting same Matches Separately instea...

Re: Matching text to sentences - Outputting same Matches Separately in...

Re: Matching text to sentences - Outputting same Matches Separately in...

Matching text to sentences - Outputting same Matches Separately instea...