BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
jwhite
Quartz | Level 8

I've got a dataset that has a bunch of searched terms pulled down from Google Analytics. What I want to do is group these terms to find categories I can start classifying them as.

 

For example, the following may all have been searched for:

  • Breast Cancer
  • Cancer
  • Lung Cancer

To find these groupings I've created a second copy of this dataset, and am attmepting to join them in DI Studio where if 1.search contains 2.search or 2.search =* (sounds like) 2.search then get then join.

 

It doesn't seem to be working though. I guess I'm expecting output from both tables joined as one like:

search_1          search_2

Breast Cancer  Breast Cancer

Cancer             Breast Cancer

Cancer             Cancer

Cancer             Lung Cancer

Lung Cancer    Lung Cancer

 

This isn't the ideal final format, but it would at least group the terms somewhat together so I wouldn't have to comb through all these records manually...and make mistakes.

 

I think the sounds like is working a little, but I'm not getting any of these CANCER examples which I would expect. Afterall, I get them when I do a simple filter using contains ('CANCER') on a given dataset.

 

Any suggestions?

 

1 ACCEPTED SOLUTION

Accepted Solutions
jwhite
Quartz | Level 8

We are in the process of getting SAS Data Flux set up.

 

In the meantime, we're going to manuall put together a table of specific searches and create categories for them (something we'd probably do in a QKB). What I'm doing here in DI Studio is sort of an ad hoc utility to help us on our way.

 

For this case, I found what I needed. Apparently, in the Join transformation the operand following the CONTAINS operator needs to be within STRIP(). Along with the sounds like operator, I'm able to get some general groupings together that we can manually analyze to determine what categories we may potentially want to create.

 

Thanks for being on top of responding to questions, Linus!

View solution in original post

4 REPLIES 4
jwhite
Quartz | Level 8

Is it not possible for the second parameter in the CONTAINS statement to be a column? Should this be done via code instead of the DI Studio Join transformation?

LinusH
Tourmaline | Level 20

I'm not sure if I understand what you are trying to do - building diagnosis groups using sounds like or similar algorithms?

If you get a match table, how will you proceed from there?

There are SW specially designed for fuzzy logic and similar to normalize text strings. As example, ta a look at SAS Data flux suite of products.

Other thoughts, this approach feels a bit ad-hoc. Also, a bit exploratory. DI Studio is the tool if you know what to do, and want to do it regularly, in an automated way. Perhaps you should try some more exploratory tools as a start, and then move to DI Studio when you are clser to solution.

Data never sleeps
jwhite
Quartz | Level 8

We are in the process of getting SAS Data Flux set up.

 

In the meantime, we're going to manuall put together a table of specific searches and create categories for them (something we'd probably do in a QKB). What I'm doing here in DI Studio is sort of an ad hoc utility to help us on our way.

 

For this case, I found what I needed. Apparently, in the Join transformation the operand following the CONTAINS operator needs to be within STRIP(). Along with the sounds like operator, I'm able to get some general groupings together that we can manually analyze to determine what categories we may potentially want to create.

 

Thanks for being on top of responding to questions, Linus!

LinusH
Tourmaline | Level 20
Cheers and good luck!
Data never sleeps

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 1225 views
  • 0 likes
  • 2 in conversation