BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
eddieray01
Fluorite | Level 6
If you concatenate multiple text variables and make a new derived variable comprised of the others how does the subsequent text mining process this? Does it use prior text vars plus the derived text variable?

If one were to do this what text mining node accomplishes a task like this?
1 ACCEPTED SOLUTION

Accepted Solutions
TWoodfield
SAS Employee

SAS Text Miner only sees Text variables and Target variables (variables with roles Text or Target). Target variables are only seen if they have a level of binary or nominal. If there are two or more Text variables in a SAS data set, the Text Parsing node selects exactly one of the Text variables for analysis and ignores all of the rest. It has no way of knowing how any of the Text variables were created, whether concatenated or filtered or anything else. If there are two or more Text variables, the Text Parsing node uses the following selection rules:

 

1. Pick the Text variable with the greatest length.

2. If two Text variables tie for having the greatest length, pick the one that comes first in sort order. (Example: variable Animals has length 272, and variable Vegetables has length 272, choose Animals because it appears first in sort order by name (A comes before V).

 

As a best practice, never let the Text Parsing node choose for you. Set the Use status of all Text variables to No except for the one that YOU choose to include in the analysis. 

 

If you want to concatenate two or more Text variables, use a SAS Code node. Example code:

 

data &EM_EXPORT_TRAIN;
   set &EM_IMPORT_DATA;

   attrib NewText length=$242; /*Assume Text1-Text3 have length 80*/
   NewText=catx(' ',Text1,Text2,Text3);
run;

 

The ATTRIB statement is necessary to prevent truncation of the resulting concatenation. Without the ATTRIB statement, NewText would be truncated to 200 characters. 

 

You can attach a Text Parsing node to the SAS Code node and do the analysis using the concatenated variable.

 

I hope this helps.

View solution in original post

1 REPLY 1
TWoodfield
SAS Employee

SAS Text Miner only sees Text variables and Target variables (variables with roles Text or Target). Target variables are only seen if they have a level of binary or nominal. If there are two or more Text variables in a SAS data set, the Text Parsing node selects exactly one of the Text variables for analysis and ignores all of the rest. It has no way of knowing how any of the Text variables were created, whether concatenated or filtered or anything else. If there are two or more Text variables, the Text Parsing node uses the following selection rules:

 

1. Pick the Text variable with the greatest length.

2. If two Text variables tie for having the greatest length, pick the one that comes first in sort order. (Example: variable Animals has length 272, and variable Vegetables has length 272, choose Animals because it appears first in sort order by name (A comes before V).

 

As a best practice, never let the Text Parsing node choose for you. Set the Use status of all Text variables to No except for the one that YOU choose to include in the analysis. 

 

If you want to concatenate two or more Text variables, use a SAS Code node. Example code:

 

data &EM_EXPORT_TRAIN;
   set &EM_IMPORT_DATA;

   attrib NewText length=$242; /*Assume Text1-Text3 have length 80*/
   NewText=catx(' ',Text1,Text2,Text3);
run;

 

The ATTRIB statement is necessary to prevent truncation of the resulting concatenation. Without the ATTRIB statement, NewText would be truncated to 200 characters. 

 

You can attach a Text Parsing node to the SAS Code node and do the analysis using the concatenated variable.

 

I hope this helps.

 

This is a knowledge-sharing community for learners in the Academy. Find answers to your questions or post here for a reply.
To ensure your success, use these getting-started resources:

Estimating Your Study Time
Reserving Software Lab Time
Most Commonly Asked Questions
Troubleshooting Your SAS-Hadoop Training Environment

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 1 reply
  • 455 views
  • 0 likes
  • 2 in conversation