Re: Question about exported data from text parsing in SAS Enterprise M...

Jonathanzz · Posted 04-17-2019 03:56 AM

I am trying to use text parsing to keep only useful words in the dataset using my specified start list. However, there is no any change in the text parsing exported dataset. The result shows that only staff, restaurant, swimming pool and spa with "Y" in keep column whereas all other words are "N", however, the exported dataset does not change anything at all.

Original dataset:

review_ID review

0001 I am satisfied with the staff and also the restaurant

0002 The swimming pool and spa are amazing

...

Start list:

staff

restaurant

swimming pool

spa

Expected exported dataset from text parsing:

review_ID review

0001 staff, restaurant

0002 swimming pool, spa

...

Actual exported dataset from text parsing: (not what I want)

review_ID review

0001 I am satisfied with the staff and also the restaurant

0002 The swimming pool and spa are amazing

...

RussAlbright · Posted 04-17-2019 10:41 AM

The Text Parse node creates an underlying representation in the Terms table (which you mentioned you saw) and a term-by-document frequency table that we refer to as the parent table. You cannot directly see this unless you look in your workspace project directory.

When you follow the Text Parse node with a Text Filter node and other Text Mining nodes, these representations are used and not the original input text in that export table. So the stopped terms are being used. It is not until you use a Text Cluster node or a Text Topics node that you see the change on the exported table. And even then, the change is in a set of columns that are the numeric representation of the document (taking into account your stopped terms). The actual raw input text is never changed and exported.

Russ

Register today and join us virtually on June 16!
sasglobalforum.com | #SASGF

View now: on-demand content for SAS users

Jonathanzz · Posted 04-18-2019 12:41 AM

Thanks! Russ.

However, if I want to perform link analysis on review factors which I specified in the start list (i.e. swimming pool, spa, etc.). How should I do it?

Should I edit the dataset to be like this?

review_ID review

0001 I

0001 am

0001 satisfied

0001 with

.... .....

0002 The

0002 swimming

0002 pool

0002 and

0002 spa

.... .....

and let text parsing node to do his job to ignore those word not in the start list.

RussAlbright · Posted 04-18-2019 09:41 AM

Jonathon,

You can use the parent table in the workspace directory. It has the form of triples

termnum document frequency

In the end, in order to interpret results, you just have to map the termnum back to the term string from the terms table.

Russ

Register today and join us virtually on June 16!
sasglobalforum.com | #SASGF

View now: on-demand content for SAS users

Question about exported data from text parsing in SAS Enterprise Miner

Re: Question about exported data from text parsing in SAS Enterprise Miner

Re: Question about exported data from text parsing in SAS Enterprise Miner

Re: Question about exported data from text parsing in SAS Enterprise Miner

Question about exported data from text parsing in SAS Enterprise Miner

Re: Question about exported data from text parsing in SAS Enterprise Miner

Re: Question about exported data from text parsing in SAS Enterprise Miner

Re: Question about exported data from text parsing in SAS Enterprise Miner

Ready to join fellow brilliant minds for the SAS Hackathon?