Are there any study groups you are aware of to help in preparing for the exams? Tried practical exam but still not grasping some of the concepts and was curious if any networks or study groups for these certifications.
SAS Text Miner operates on a SAS table that either contains a text document, or the location of a decoded text document. An encoded text document might be a Microsoft Word document, or an Adobe PDF file, or one of over 100 encoding formats supported by the Text Import node. A decoded document is a simple text file, like one that might be edited using Notepad. (Various decoding formats are supported, but in the simplest case, the decoded format is compatible with ASCII encoding.) A decoded document is stored in a SAS data set as a character variable. A SAS character variable can have at most 32,767 characters. (Note that 32767=215-1.) When a document exceeds 32,767 characters in length, the version stored as a SAS character variable is truncated. If you do nothing, the analysis in SAS Text Miner will only use the truncated document.
To perform an analysis on the full document, you need to take advantage of the Text Location role. When a variable in SAS Enterprise Miner has a role of Text Location, it is assumed to contain the complete pathname of the file that contains the complete decoded document, even if it exceeds 32,767 characters.
If you use the Text Import node, both the variable with the role Text and the variable with the role Text Location are created. If truncation occurs, as indicated by the TRUNCATED variable having a value of 1 (one), and you do nothing, then the analysis will use truncated documents. To avoid truncation, set the role of the text variable to Rejected, and keep the role of the file pathname variable as Text Location.
If you decode your documents using some other tool, then you must manage the creation of the SAS table used by SAS Text Miner. If you are creating a variable with the role Text Location, note that the pathname is relative to the server computer when you are in a client-server environment.
As an aside, SAS Visual Text Analytics handles things a little differently, because in-memory tables can have character variables of arbitrary length (varchar).
Question I have for instructors I wasn't clear on, if a character variable in the Text Import Node exceeds 32,767 (~10 pages). SAS Enterpriser Miner will:
I was confused if truncated meant only the view of it in SAS EM would be truncated versus actual subsequent data processing would use truncated version. Appreciate any clarity you can provide on this.
FYI: I did not have any cases like this in the demo/exercises (rightfully so due to size), but 2.1 Demo: Using The Import Node says "I need to keep that Filtered variable that becomes my text location to read entire document for analysis" So, if this is case how does SAS Enterprise Miner know when we say "filtered" variable (>32,767) to use text location versus actual document saved in destination.
SAS Text Miner operates on a SAS table that either contains a text document, or the location of a decoded text document. An encoded text document might be a Microsoft Word document, or an Adobe PDF file, or one of over 100 encoding formats supported by the Text Import node. A decoded document is a simple text file, like one that might be edited using Notepad. (Various decoding formats are supported, but in the simplest case, the decoded format is compatible with ASCII encoding.) A decoded document is stored in a SAS data set as a character variable. A SAS character variable can have at most 32,767 characters. (Note that 32767=215-1.) When a document exceeds 32,767 characters in length, the version stored as a SAS character variable is truncated. If you do nothing, the analysis in SAS Text Miner will only use the truncated document.
To perform an analysis on the full document, you need to take advantage of the Text Location role. When a variable in SAS Enterprise Miner has a role of Text Location, it is assumed to contain the complete pathname of the file that contains the complete decoded document, even if it exceeds 32,767 characters.
If you use the Text Import node, both the variable with the role Text and the variable with the role Text Location are created. If truncation occurs, as indicated by the TRUNCATED variable having a value of 1 (one), and you do nothing, then the analysis will use truncated documents. To avoid truncation, set the role of the text variable to Rejected, and keep the role of the file pathname variable as Text Location.
If you decode your documents using some other tool, then you must manage the creation of the SAS table used by SAS Text Miner. If you are creating a variable with the role Text Location, note that the pathname is relative to the server computer when you are in a client-server environment.
As an aside, SAS Visual Text Analytics handles things a little differently, because in-memory tables can have character variables of arbitrary length (varchar).
This is a knowledge-sharing community for learners in the Academy. Find answers to your questions or post here for a reply.
To ensure your success, use these getting-started resources:
Estimating Your Study Time
Reserving Software Lab Time
Most Commonly Asked Questions
Troubleshooting Your SAS-Hadoop Training Environment
Ready to level-up your skills? Choose your own adventure.