BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
eddieray01
Fluorite | Level 6

Are there any study groups you are aware of to help in preparing for the exams?  Tried practical exam but still not grasping some of the concepts and was curious if any networks or study groups for these certifications.

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
TWoodfield
SAS Employee

SAS Text Miner operates on a SAS table that either contains a text document, or the location of a decoded text document. An encoded text document might be a Microsoft Word document, or an Adobe PDF file, or one of over 100 encoding formats supported by the Text Import node. A decoded document is a simple text file, like one that might be edited using Notepad. (Various decoding formats are supported, but in the simplest case, the decoded format is compatible with ASCII encoding.) A decoded document is stored in a SAS data set as a character variable. A SAS character variable can have at most 32,767 characters. (Note that 32767=215-1.) When a document exceeds 32,767 characters in length, the version stored as a SAS character variable is truncated. If you do nothing, the analysis in SAS Text Miner will only use the truncated document.

 

To perform an analysis on the full document, you need to take advantage of the Text Location role. When a variable in SAS Enterprise Miner has a role of Text Location, it is assumed to contain the complete pathname of the file that contains the complete decoded document, even if it exceeds 32,767 characters.

 

If you use the Text Import node, both the variable with the role Text and the variable with the role Text Location are created. If truncation occurs, as indicated by the TRUNCATED variable having a value of 1 (one), and you do nothing, then the analysis will use truncated documents. To avoid truncation, set the role of the text variable to Rejected, and keep the role of the file pathname variable as Text Location.

 

If you decode your documents using some other tool, then you must manage the creation of the SAS table used by SAS Text Miner. If you are creating a variable with the role Text Location, note that the pathname is relative to the server computer when you are in a client-server environment.

 

As an aside, SAS Visual Text Analytics handles things a little differently, because in-memory tables can have character variables of arbitrary length (varchar).

View solution in original post

4 REPLIES 4
Cynthia_sas
SAS Super FREQ
HI:
We're not aware of any external study groups for Exam 5. However, this is the best place to find any if they exist.

If you have questions about the material in the courses or the Case Studies, you can post them here and we can ask the course instructors for their feedback.
Cynthia
eddieray01
Fluorite | Level 6

Question I have for instructors I wasn't clear on,  if a character variable in the Text Import Node exceeds 32,767 (~10 pages).  SAS Enterpriser Miner will:

  1. Will Flag the variable as Truncated - assuming this is correct?
  2. that character variable will be truncated when stored as text field in SAS destination dataset - assuming this is correct?
  3. subsequent text mining/processing downstream will use truncated version not the original - assuming this is correct?

I was confused if truncated meant only the view of it in SAS EM would be truncated versus actual subsequent data processing would use truncated version. Appreciate any clarity you can provide on this.

 

FYI: I did not have any cases like this in the demo/exercises (rightfully so due to size), but 2.1 Demo: Using The Import Node says "I need to keep that Filtered variable that becomes my text location to read entire document for analysis"  So, if this is case how does SAS Enterprise Miner know when we say "filtered" variable (>32,767) to use text location versus actual document saved in destination.  

TWoodfield
SAS Employee

SAS Text Miner operates on a SAS table that either contains a text document, or the location of a decoded text document. An encoded text document might be a Microsoft Word document, or an Adobe PDF file, or one of over 100 encoding formats supported by the Text Import node. A decoded document is a simple text file, like one that might be edited using Notepad. (Various decoding formats are supported, but in the simplest case, the decoded format is compatible with ASCII encoding.) A decoded document is stored in a SAS data set as a character variable. A SAS character variable can have at most 32,767 characters. (Note that 32767=215-1.) When a document exceeds 32,767 characters in length, the version stored as a SAS character variable is truncated. If you do nothing, the analysis in SAS Text Miner will only use the truncated document.

 

To perform an analysis on the full document, you need to take advantage of the Text Location role. When a variable in SAS Enterprise Miner has a role of Text Location, it is assumed to contain the complete pathname of the file that contains the complete decoded document, even if it exceeds 32,767 characters.

 

If you use the Text Import node, both the variable with the role Text and the variable with the role Text Location are created. If truncation occurs, as indicated by the TRUNCATED variable having a value of 1 (one), and you do nothing, then the analysis will use truncated documents. To avoid truncation, set the role of the text variable to Rejected, and keep the role of the file pathname variable as Text Location.

 

If you decode your documents using some other tool, then you must manage the creation of the SAS table used by SAS Text Miner. If you are creating a variable with the role Text Location, note that the pathname is relative to the server computer when you are in a client-server environment.

 

As an aside, SAS Visual Text Analytics handles things a little differently, because in-memory tables can have character variables of arbitrary length (varchar).

eddieray01
Fluorite | Level 6
Thank you for the clarity that makes sense.

 

This is a knowledge-sharing community for learners in the Academy. Find answers to your questions or post here for a reply.
To ensure your success, use these getting-started resources:

Estimating Your Study Time
Reserving Software Lab Time
Most Commonly Asked Questions
Troubleshooting Your SAS-Hadoop Training Environment

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 1014 views
  • 0 likes
  • 3 in conversation