I've got a dataset that has a bunch of searched terms pulled down from Google Analytics. What I want to do is group these terms to find categories I can start classifying them as.
For example, the following may all have been searched for:
To find these groupings I've created a second copy of this dataset, and am attmepting to join them in DI Studio where if 1.search contains 2.search or 2.search =* (sounds like) 2.search then get then join.
It doesn't seem to be working though. I guess I'm expecting output from both tables joined as one like:
search_1 search_2
Breast Cancer Breast Cancer
Cancer Breast Cancer
Cancer Cancer
Cancer Lung Cancer
Lung Cancer Lung Cancer
This isn't the ideal final format, but it would at least group the terms somewhat together so I wouldn't have to comb through all these records manually...and make mistakes.
I think the sounds like is working a little, but I'm not getting any of these CANCER examples which I would expect. Afterall, I get them when I do a simple filter using contains ('CANCER') on a given dataset.
Any suggestions?
We are in the process of getting SAS Data Flux set up.
In the meantime, we're going to manuall put together a table of specific searches and create categories for them (something we'd probably do in a QKB). What I'm doing here in DI Studio is sort of an ad hoc utility to help us on our way.
For this case, I found what I needed. Apparently, in the Join transformation the operand following the CONTAINS operator needs to be within STRIP(). Along with the sounds like operator, I'm able to get some general groupings together that we can manually analyze to determine what categories we may potentially want to create.
Thanks for being on top of responding to questions, Linus!
Is it not possible for the second parameter in the CONTAINS statement to be a column? Should this be done via code instead of the DI Studio Join transformation?
I'm not sure if I understand what you are trying to do - building diagnosis groups using sounds like or similar algorithms?
If you get a match table, how will you proceed from there?
There are SW specially designed for fuzzy logic and similar to normalize text strings. As example, ta a look at SAS Data flux suite of products.
Other thoughts, this approach feels a bit ad-hoc. Also, a bit exploratory. DI Studio is the tool if you know what to do, and want to do it regularly, in an automated way. Perhaps you should try some more exploratory tools as a start, and then move to DI Studio when you are clser to solution.
We are in the process of getting SAS Data Flux set up.
In the meantime, we're going to manuall put together a table of specific searches and create categories for them (something we'd probably do in a QKB). What I'm doing here in DI Studio is sort of an ad hoc utility to help us on our way.
For this case, I found what I needed. Apparently, in the Join transformation the operand following the CONTAINS operator needs to be within STRIP(). Along with the sounds like operator, I'm able to get some general groupings together that we can manually analyze to determine what categories we may potentially want to create.
Thanks for being on top of responding to questions, Linus!
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9.
Early bird rate extended! Save $200 when you sign up by March 31.
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.