SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop and others

DI Studio: Join w/ Contains Operator

Accepted Solution Solved
Reply
Frequent Contributor
Posts: 88
Accepted Solution

DI Studio: Join w/ Contains Operator

[ Edited ]

I've got a dataset that has a bunch of searched terms pulled down from Google Analytics. What I want to do is group these terms to find categories I can start classifying them as.

 

For example, the following may all have been searched for:

  • Breast Cancer
  • Cancer
  • Lung Cancer

To find these groupings I've created a second copy of this dataset, and am attmepting to join them in DI Studio where if 1.search contains 2.search or 2.search =* (sounds like) 2.search then get then join.

 

It doesn't seem to be working though. I guess I'm expecting output from both tables joined as one like:

search_1          search_2

Breast Cancer  Breast Cancer

Cancer             Breast Cancer

Cancer             Cancer

Cancer             Lung Cancer

Lung Cancer    Lung Cancer

 

This isn't the ideal final format, but it would at least group the terms somewhat together so I wouldn't have to comb through all these records manually...and make mistakes.

 

I think the sounds like is working a little, but I'm not getting any of these CANCER examples which I would expect. Afterall, I get them when I do a simple filter using contains ('CANCER') on a given dataset.

 

Any suggestions?

 


Accepted Solutions
Solution
‎04-20-2016 04:45 PM
Frequent Contributor
Posts: 88

Re: DI Studio: Join w/ Contains

[ Edited ]

We are in the process of getting SAS Data Flux set up.

 

In the meantime, we're going to manuall put together a table of specific searches and create categories for them (something we'd probably do in a QKB). What I'm doing here in DI Studio is sort of an ad hoc utility to help us on our way.

 

For this case, I found what I needed. Apparently, in the Join transformation the operand following the CONTAINS operator needs to be within STRIP(). Along with the sounds like operator, I'm able to get some general groupings together that we can manually analyze to determine what categories we may potentially want to create.

 

Thanks for being on top of responding to questions, Linus!

View solution in original post


All Replies
Frequent Contributor
Posts: 88

Re: DI Studio: Join w/ Contains

[ Edited ]

Is it not possible for the second parameter in the CONTAINS statement to be a column? Should this be done via code instead of the DI Studio Join transformation?

Super User
Posts: 5,254

Re: DI Studio: Join w/ Contains

I'm not sure if I understand what you are trying to do - building diagnosis groups using sounds like or similar algorithms?

If you get a match table, how will you proceed from there?

There are SW specially designed for fuzzy logic and similar to normalize text strings. As example, ta a look at SAS Data flux suite of products.

Other thoughts, this approach feels a bit ad-hoc. Also, a bit exploratory. DI Studio is the tool if you know what to do, and want to do it regularly, in an automated way. Perhaps you should try some more exploratory tools as a start, and then move to DI Studio when you are clser to solution.

Data never sleeps
Solution
‎04-20-2016 04:45 PM
Frequent Contributor
Posts: 88

Re: DI Studio: Join w/ Contains

[ Edited ]

We are in the process of getting SAS Data Flux set up.

 

In the meantime, we're going to manuall put together a table of specific searches and create categories for them (something we'd probably do in a QKB). What I'm doing here in DI Studio is sort of an ad hoc utility to help us on our way.

 

For this case, I found what I needed. Apparently, in the Join transformation the operand following the CONTAINS operator needs to be within STRIP(). Along with the sounds like operator, I'm able to get some general groupings together that we can manually analyze to determine what categories we may potentially want to create.

 

Thanks for being on top of responding to questions, Linus!

Super User
Posts: 5,254

Re: DI Studio: Join w/ Contains Operator

Cheers and good luck!
Data never sleeps
☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 4 replies
  • 436 views
  • 0 likes
  • 2 in conversation