About PeterChristie

PeterChristie

This old nursery rhyme described a challenge the woman had to deal with. Keeping track of children takes a lot of patience and energy as many of you know. In SAS Visual Text Analytics, the Concepts node is where you enter detailed rules that extract information from document collections. These rules are so powerful that over time you might have forgotten the clever ideas you used in individual rules to build up your overall process. In a recent SAS Visual Text Analytics class, while teaching the concept rules section, I was inspired by a discussion with my students. They are experienced text analytics users and actively run hundreds of concept rules in their own environment. They were wondering if there was a way to get a nice, detailed inventory listing of the concept rules and the syntax of all LITI statements used in a Concepts node. As of the date this post is being written, there is no button available in the user interface that gives you this information. This post provides information on how you can produce a list of the concept rules and syntax of all LITI statements used in the Concepts node. Since there are a few steps involved in the process, I’ll summarize them here first; Locate the system generated project name and concept rules config table name Load the table using code in SAS Studio Add a readable RuleName column Clean up the rule syntax in the final table First, I found a Concepts node in a project that included custom concepts rules. The next step was to find the actual concept rules table so I could extract the detailed rule syntax. If you are new to SAS Visual Text Analytics, this post will provide some background. This is a list of the custom concept rules from the project I selected. Notice the naming convention incorporates underscores and all capital letters to differentiate the rule name from any text that might occur in a document. This is a recommended rule naming convention. Select any image to see a larger version. Mobile users: To view the images, select the "Full" version at the bottom of the page. I know that there are tables “behind the scenes” that contain vital information for each project. Refer to this post by Noah Powers for the details. I first opened the Concepts node’s log to find the internal names of the project as well as the specific Concepts node. I noted the names from the log (shown below) for use in my next program. It was easy to get the information I needed to start looking for the detailed rule syntax. In case you don’t have a magnifying glass handy, the "wrapped" log portion below highlights the relevant lines from the previous log. The first highlight is the project name and the second is the Concept_RULESCONFIG table name. The following program uses the previously identified project and node names. It outputs a table of the rules from that concepts node. cas; caslib _all_ assign; /* enter project and concept table name */ proc casutil incaslib="Analytics_Project_ce2afc9d-0ac4-438d-afa3-d417ae037fc5" outcaslib="casuser"; save casdata="646dc624-5b67-47b9-b070-9ff6f91bb7a2_CONCEPT_RULESCONFIG" casout="concept_rules.sashdat" replace; run; /* load the results table so it can be viewed */ proc casutil incaslib="casuser" outcaslib="casuser"; load casdata="concept_rules.sashdat" casout="concept_rules" replace; run; I named the output table “concept_rules”. It has the extracted rule and the system-generated (not user-friendly) ruleid. OK, we are making progress but are not quite there yet. The concept rules table contains a “representation” of the rule syntax, but it includes an extra "rule name" in every line of the code, making it hard to read. In addition, I don’t want to keep the internal ruleId column. Let’s consider the first row in the previous table. It has the syntax for all LITI statements in the _CONDITION_ rule, but also has the extra text “_CONDITION_:” in each line of the rule that is not part of the original syntax. I want to remove it from my results. The last 4 lines of each rule are not part of the original rule syntax either, but they tell us if the rule is enabled, the fullpath of the rule, its priority and whether it is case sensitive or not. You could write some code to exclude those lines from your listing if you wanted to but I thought they would be useful to keep. Since I'm working towards receiving a 'good data steward' award one day, I ran this optional step next to identify the maximum size of a rule from the node to not waste space. /* verify the maximum length of the concept rules in the node */ proc contents data=casuser.concept_rules; run; In this example, the maximum bytes used for the largest rule is 307. Chances are your rules take more space, so you might just want to check for use in a later step. To create a new understandable RuleName column with the actual rule name instead of the internal ruleId, the following code scans the config column for the second term using a colon as the delimiter. The resulting table is below the code. /* extract the RULENAME as a separate column in a temporary working dataset */ data casuser.one; length RuleName $36. ; /* modify rulename length if desired */ set casuser.concept_rules; RuleName = scan(config, 2, ':'); /* save each rulename from the table */ run; The final challenge is to remove the rule name from each line of LITI code in the config column, and save the results in a table containing all rules in the node. This code replaces the second token from the config column and its trailing colon with a blank. It also limits the cleaned output column to a number close to the proc contents output earlier. /* remove the rulename from each rule to see the original syntax */ data casuser.two (keep=rulename cleaned); set casuser.one; /* change length based on proc contents output to allow for longer rules */ length cleaned $320. ; /* remove the rulename from the table output as well as the extra colon */ cleaned = tranwrd(config, (scan(config, 2, ':')||":"), " "); put cleaned=; run; Tranwrd in this example uses the scan function to remove the extra rulename from each line producing the result in the "cleaned" column in the screen capture below. (Thanks Kathy and Danny!) Notice how the results above compare with the original rules that were entered in the SAS Visual Text Analytics application (shown below). Here is the original _ACTIVITY_ rule for example. The resulting table is an inventory of your rules, including the complete syntax of your concept rules. You can run reports against the table to look for redundant rules and document advanced LITI syntax and best practices used in your organization. You can adapt this technique to work on other "behind the scenes"project tables. Next Steps If this topic resonates with you, I strongly recommend that you check out this post on Programmatic Manipulation of Concept Rules. It is Part 2 of an excellent 3-part series for Power Users that takes this idea to the next level. In the post, you will find a reference to our public repository containing complete sample code for writing out the rules from all concept nodes in all pipelines that are in a Visual Text analytics project! Part 1 of the series provides recommended practices for structuring concept rules, while Part 3 provides sample code for integrating version control and quality assurance for your project using a Git repository. Thanks for reading! Here is the complete code used in this post. cas; caslib _all_ assign; /* enter project and concept table name */ proc casutil incaslib="Analytics_Project_ce2afc9d-0ac4-438d-afa3-d417ae037fc5" outcaslib="casuser"; save casdata="646dc624-5b67-47b9-b070-9ff6f91bb7a2_CONCEPT_RULESCONFIG" casout="concept_rules.sashdat" replace; run; /* load the results table so it can be viewed */ proc casutil incaslib="casuser" outcaslib="casuser"; load casdata="concept_rules.sashdat" casout="concept_rules" replace; run; /* verify the maximum length of the concept rules in the node */ proc contents data=casuser.concept_rules; run; /* extract the RULENAME as a separate column in a temporary working dataset */ data casuser.one; length RuleName $36.; /* modify rulename length if desired */ set casuser.concept_rules; RuleName=scan(config, 2, ':'); /* save each rulename from the table */ run; /* remove the rulename from each rule to see the original syntax */ data casuser.two (keep=rulename cleaned); set casuser.one; /* change length of output field based on proc contents output to allow for longer rules */ length cleaned $320.; /* remove the rulename from the table output as well as the extra colon */ /* tranword and scan functions are used */ cleaned=tranwrd(config, (scan(config, 2, ':')||":"), " "); put cleaned=; run; Find more articles from SAS Global Enablement and Learning here.

PeterChristie · ‎08-18-2025

Welcome to this 3rd post in the series on the Mahalanobis-Taguchi system (MTS). The first post introduced the new procedures: MTS and MTSSCORE. The MTS procedure creates a model representing a system while it is operating in error-free conditions. The model is built using historic data at rest. Proc MTS saves the model as an astore (analytic store) binary file for ease of deployment. Proc MTSSCORE uses the astore file to score data (events) collected from the current running system. MTSSCORE output 1) identifes when the system operates outside the bounds established in the error-free condition and 2) assigns a ‘gains’ value for each numeric input of the system for each event. Variables with a high ‘gains’ score contribute to the system entering a “fault” condition. This information aids root-cause analysis of a system that may be headed towards eventual failure. Primary Objective I wanted to see if I could use Grafana to create a meaningful visualization from streaming gains values in SAS Event Stream Processing. The following diagnostic gains chart rendered by the MTSSCORE procedure as described in the first post is the inspiration for my quest. In this chart, the latest event appears at the bottom, and the "important" variables with the highest gains appear at the left. Select any image to see a larger version. Mobile users: To view the images, select the "Full" version at the bottom of the page. The 2nd post in the series deploys the MTS astore model in SAS Event Stream Processing Studio using the SAS Event Stream Processing Calculate window that now supports the MTS scoring process. The Calculate window generates a gains value for each input variable for each event, and places it in an array. The challenge is that data in an array structure is not suitable for visualization in Grafana, the graphical product used by SAS Event Stream Processing. For those of you that are new to working with data stored in arrays, read on to learn more. For those accustomed to working with arrays, you may choose to skip ahead to the Grafana Visualization section to see how closely my SAS Event Stream Processing/Grafana plot resembles the MTSSCORE plot. Extract values from an array Building on the SAS Event Stream Processing project from post 2, I will add code to a Lua window that converts the gains array back into individual gains columns so I could continue to work on the visualization. Why would anyone want to use Lua instead of Python? Well, that totally depends on you. Many data handling capabilities can be done in either language. The Lua language is described as lightweight. It is used in game development and is also well suited for working with data in applications like SAS Event Stream Processing. It is fast and provides many facilities for table manipulations, math operations, xml and json string management and its syntax doesn’t rely on indentation. It is also easy for Python programmers to pick up. With SAS Event Stream Processing, you can use both Lua and Python windows in a project. These are the windows used in my SAS Event Stream Processing project: In the SAS Event Stream Processing project, the Calculate window stores the computed gainList value for each model input in array format. The LuaArray and PythonArray windows extract individual values from an array. The FilterEngine123 window returns values for a specific engine for plotting and exception monitoring. This output from the Calculate window output shows Mahalanobis distance values, outliers and the gainList array. Lua code used to expand the array Before creating a graph, we need to assign each computed gains value in the gainList array to an appropriate variable name. To do this, I first specified each variable name in the same order as the source window’s schema in a table I called varnames. I could have added a prefix like “gain” to each name to differentiate the Lua window output from the Source window input variables, but since only the gainList array is passed to the Lua window, I decided to use the same name for both windows. This is the Lua code. The included comments starting with "--" describe the Lua code in the screen capture below. In the screen capture below from the Lua Window the output shows the gainList array and the individual variable names with the extracted values. Notice that datetime represents the time series of events, and there is an engine identifier that is used in the Filter window to create a graph for a specific engine. Lua window results showing individual gain columns ready for visualization. Here is the equivalent Python code to extract values in the array to the varnames columns. I chose to add the ‘gain’ prefix to each variable name for this window’s output to explicitly differentiate the output variables from the variables in the source window. Grafana Visualization With the SAS Event Stream Processing project processing streaming data, let’s use Grafana to try and visualize the gains. The objective is to identify variables with high gains that cause potential error conditions. This is the original MTS Monitoring chart followed by the MTS diagnostic gains chart produced by proc MTSSCORE from my first post in this series. The variables with the highest gain value appear to the left in the second chart which is the one I want to recreate. For the Grafana MTS Diagnostic Gains chart from SAS Event Stream Processing, I chose similar blue/yellow/red colors as the proc MTSSCORE chart above. Notice how the color gradients in both legends are similar. This plots the time series moving from left to right whereas the chart above plots the time series moving from top to bottom. Variables with dark blue have low gains while light blue, orange and red have high gains. The results of both visualizations match! If you would like to build this chart, select the Heatmap visualization in Grafana. For other properties, choose the RdYlBu color scheme, set the steps slider property to 30 to blend the colors in the legend, set the start color scale to -0.2 and the end color scale to 8. Proc MTSSCORE displays the variables from left to right by decreasing gain value while Grafana orders the variables alphabetically from bottom to top. Even with this difference, it is still easy to spot the red, orange and yellow sections of the visualization to find important variables such as TotBypassPressure and FuelPressureRatio. I decided to re-create the plot using a red, yellow, and green traffic lighting scheme and placed them side by side below for comparison. I chose the RdYlGn color scheme, set the steps slider property to 8 to have fewer gradients in the legend, set the start color scale to 0 and the end color scale to 6. I like the green/yellow/red color version with fewer “steps”. It provides a “choppier” legend, produces fewer colors in the chart, but makes it easy to identify variables that are experiencing high gains values. In either case, with Grafana, you can select colors and legend layouts that are meaningful to you. Finally, I re-configured the Y Axis label placement (to the right) and rotated the Grafana chart 90 degrees for this screen capture to make it easier for anyone wanting to compare the two charts. Even without sorting the variables according to gains value, it is still easy to select the variables with the highest gains by color. Grafana provides useful visualizations for testing SAS Event Stream Processing projects as shown in this series. I am grateful to Tom Tuning for his help with using Lua to process arrays in SAS Event Stream Processing for Grafana. You can see a related post from Tom here: Fun-with-Lua-and-SAS-Event-Stream-Processing Mission accomplished. Thanks for reading! Find more articles from SAS Global Enablement and Learning here.

PeterChristie · ‎07-07-2025

Mahalanobis-Taguchi System (MTS) repeat after me: ‘Ma-Ha-la-no-bis'. 1) Overview of using the Mahalanobis-Taguchi System for anomaly detection The Mahalanobis-Taguchi System (MTS) is a new forecasting procedure available in SAS Visual Forecasting and in SAS Event Stream Processing. It is a multivariate statistical method used for pattern recognition and diagnosis applicable in countless use cases and industries. This is a link to an overview of the MTS system. The first part of this technique establishes a baseline ‘reference space’ using interval variable data representing any system while it is in a healthy state. The reference space is based on the Mahalanobis Distance (MD): the distance between a point and a distribution that takes into account the correlations of the data set. If new data that is scored using this technique falls outside the MD threshold, the system is flagged as no longer being in a healthy state. The second part of this technique implements the Taguchi methods. These methods include a quality loss function that quantifies the cost of deviation from the target performance and a signal-to-noise ratio. The MTS has three phases that occur in this order: a training phase that constructs a reference space by using MD from normal data (healthy state), and then calculates MD for each sample in the reference data set to establish a baseline an evaluation phase that calculates MD for new data points and compares them to the reference space to classify new data as normal or abnormal an optimization phase that uses Taguchi methods to optimize the system parameters and compute a measure of importance called “gain” for each variable. In my previous post, I described how to use Proc MTS and Proc MTSSCORE to analyze a system of 24 sensors monitoring a turbofan jet engine. In this post, you create a new SAS Event Stream Processing (ESP) project to score new data from jet engines in real-time. The objective is to monitor and identify any changes that might be a leading indicator of failure in an engine. The changes are identified by comparing the MD from current data to the MD threshold established under normal operating conditions for the 24 sensor readings over time. If the model reveals a current MD value that is above the selected threshold of 3, the event is flagged as an abnormal condition. 2) Implementing MTS scoring in SAS Event Stream Processing SAS Event Stream Processing projects use connected source and derived windows. In our example a source window ingests the sensor data and passes it onto a calculate window for processing. Anyone unfamiliar with ESP can take our SAS Event Stream Processing Essentials e-learning class to quickly get up to speed. There are many analytic techniques already included with SAS Event Stream Processing as shown here. There are occasions, however, when it is useful to supplement these with an advanced analytic model such as MTS that is trained on historic data-at-rest. This model is used to score streaming data in real-time. The MTS system is available as a plug-in for the SAS Event Stream Processing Calculate window and can be used for real-time streaming event anomaly detection. This is the Proc MTS code from the previous post. The savestate statement created a binary form of the model scorecode for model deployment. PROC MTS data=CASUSER.CENSORtrain; input x1-x24 ; by engine; id datetime; output out=casuser.out mean=casuser.mean cov=casuser.cov stat=casuser.stat scoreinfo=casuser.scoreinfo; savestate rstore=casuser.model; run; Now, we'll outline the steps to upload the MTS model into ESP and configure and test it for real-time scoring. The Proc ASTORE code to download the model file as mts.astore to a file on your local system is shown below. Next, you can upload it into ESP. proc astore; DOWNLOAD RSTORE=casuser.model STORE/home/Peter@sas.com/mts.astore"; run; It is important to place this file in a location that ESP can access. You use the up-arrow icon to upload it into your ESP Project Package. 3) Project Packages in ESP I am a big fan of Project Packages, and I’m confident that you will be too. A project package contains a standard set of folders that contain the files that the project references. This makes testing new projects convenient. Project packages are self-contained and can also be shared very easily. You can upload files and data into the test_files folder and reference them in your project window’s properties. In the following screen capture notice 1) the upload arrow, 2) the uploaded mts.astore file and censorscore.csv file, and 3) the Mahalanobis Taguchi System algorithm specified in the Calculate window properties. You click the upload arrow to copy your mts.astore file to the test_files folder for accessibility in the project. See this post for additional information on Project Packages. In ESP Studio, you add a source window, including all 24 input variables and flag both the engine identifier and datetime cycle variables as key variables in the output schema. (not shown) You then connect a Calculate window to the Source window and select the new MTS (Mahalanobis Taguchi System) algorithm. Under Parameters in the screen below, point modelReference to the mts.astore file that you uploaded to the test_files folder in the project package. Remember this astore file works just like proc MTSSCORE from the previous post, but in a real-time streaming application. The ESP Calculate window includes support for processing the astore format of the model and executing the Taguchi MTS scoring technique. The screen capture also shows the field names entered in the calculate window’s Output Map. In the Output Schema of the Calculate window, you have to specify type: Array(dbl) for the gainList so the computed gain value for each variable would be available in other windows and in your project's test results. Running a test of this ESP model with the model’s Calculate window selected, you can review the Mahalanobis distance and normalized distance, whether the observation is an outlier or not and the computed gain value for each of the 24 input variables in the gainList array. The outlier column contains a 1 for any scored event having a MD value greater than the MD threshold value established during model training. The values in the array under the gainList column are the result of applying the Taguchi system in the scoring process that was run using Proc MTSSCORE in my previous post. The order of the gain values match up with the order of the input variables listed in the source window. For an event that has outlier flagged as 1, the largest value from the gainList array is the variable contributing most to the outlier condition. The second largest value of the gainList array is the second most important variable contributing to the condition. In an upcoming post, I will consider ways to further process the gainList array and view results. This post highlights the collaboration between SAS R&D Developers in the Forecasting and ESP groups as they successfully implemented this new analytical technique for use in several solutions. You have multiple choices of algorithms available when it comes to detecting anomalies in data, and now you can add the MTS system to your list of techniques to try for your next project. Just to make sure it sticks, one more time… repeat after me: ‘Ma-Ha-la-no-bis'. Thanks for reading! Find more articles from SAS Global Enablement and Learning here.

PeterChristie · ‎05-12-2025

Perceiving potential problems proactively in complex environments can be perplexing. There may be hundreds of elements that need to be monitored in a system that you have to manage. How do you know which variables are important to track? How do you know which variables may be responsible for causing a system to change from a normal condition to an abnormal condition? What if you could use a technique to track many disparate continuous numeric variables all together as if they were a single consolidated unit. Would you like to know if that unit starts to change in some way? The variables of interest might come from sensor readings in chemical processes, manufacturing environments, complex electrical or mechanical componentry, engines, medical or financial systems. Sometimes you need to compare more than just apples and oranges in a complex system. This post introduces the MTS (Mahalanobis-Taguchi) system as a fault detection technique for multivariate data. The MTS system became available starting with the SAS Viya Stable Release 2025.02 in the form of new action sets and procedures in SAS Visual Forecasting. MTS is also supported within SAS Event Stream Processing Studio (ESP) for real-time anomaly detection. Multivariate Anomaly Detection Technique The MTS System combines two statistical methods - Mahalanobis distance (MD) and Taguchi orthogonal arrays within the MTS and MTSSCORE procedures. (MTS; Taguchi, Chowdhury, and Wu 2000) The MTS System is a statistical method used for pattern recognition, equipment health monitoring, fault analysis and diagnostics of multivariate data. All variables must be continuous and trend-free. Observations with missing values are excluded from training and from scoring. Note: A maximum of 447 variables can be included. The MTS has three phases: a training phase that uses normal data (healthy state) and then calculates MD for each sample in the reference data set to establish a baseline a scoring (or monitoring) phase that calculates MD for new data points to be scored and compares them to the baseline to classify the new data as normal or abnormal (aka a fault) a diagnostics phase that uses Taguchi methods to perform root cause analysis of the faults. First, proc MTS runs the training process using continuous data representing the normal fault-free operations of a system. This produces a range of Mahalanobis Distance (MD) values for the observed normal operations which are used to establish an MD threshold. Then, proc MTSSCORE runs scoring and diagnostics for new data using the model created in the training step by proc MTS. The scoring computes an MD value for each new observation. If the computed MD value for the scored observation exceeds the specified MD threshold, the observation is flagged as an outlier. Diagnostics compute the importance of each variable as a signal-to-noise ratio, or gain, to quantify the importance of each variable for an abnormal condition by using a design-of-experiments approach. This helps identify the root cause of each detected fault. (see this doc for formulas and more details) Statistical methods The Mahalanobis Distance can be seen as a multivariate generalization of a standard z score which shows the distance, in standard deviations, of point x from the center of the distribution. It takes the positive definite correlation matrix into account to define the “Mahalanobis space ”. For more details, see: SAS Help Center: Mahalanobis Distance and What is Mahalanobis distance? - The DO Loop by Rick Wicklin Taguchi’s orthogonal arrays are used to define a set of cases, called "runs" or "experiments," where one or more variables are excluded from the observation. Orthogonal arrays are computed by selecting specific configurations from standard array tables used to create a matrix in which each column represents a factor and each row represents a combination of factor levels. This ensures that the effect of each factor can be isolated, allowing for accurate signal-to-noise-ratio computation and precise gain analysis. Proc MTSSCORE computes a measure of importance called “gain” for each variable. More details and formulas for this technique can be found here SAS Help Center: Computing Variable Importance Using Orthogonal Arrays Training a model Proc MTS uses multivariate data representing the normal ‘fault-free’ operations in the environment you are analyzing to train the model and establish the Mahalanobis space of the normal observations. This space contains the vector of means and the inverse covariance matrix used to find the range of MD values representing normal operational conditions. As stated earlier, the procedure can also output normalized MD values to make the scale independent of the number of variables in the system. This is used to choose an MD threshold that exceeds the maximum MD value for normal operations. This recommended value of the MD threshold is between 3 and 4. Example Scenario In this example, we train a model using simulated data for jet engines running in a fault-free operating state (the first 100 observations). Each engine in the training data has data from 24 different sensors. The ID variable datetime and BY-GROUP variable engine contain data from an individual flight segment (or cycle) for a specific engine. The simulated data continues to run the engines (from observation 101) until failure. The data source: (Saxena and Goebel 2008) Proc MTS trains the model which is stored with the savestate statement. The following code was run in SAS Studio. Refer to SAS Help Center: Syntax: MTS Procedure for descriptions of the other statements. PROC MTS data=CASUSER.CENSORtrain; input x1-x24 ; by engine; id datetime; output out=casuser.out mean=casuser.mean cov=casuser.cov stat=casuser.stat scoreinfo=casuser.scoreinfo; savestate rstore=casuser.model; run; The chart below is generated by the procedure. Each point represents the normalized Mahalanobis distance measure for a cycle or snapshot of measurements for engine=123 under normal operating conditions. All observations for the first 100 cycles fit well under the shaded default MD threshold of 3. Select any image to see a larger version. Mobile users: To view the images, select the "Full" version at the bottom of the page. Next, we use the trained model to score observations in cycles beyond the first 100 used for training. The results in the Monitoring chart below show that outliers start to occur for this engine around cycle 250 and the MD value gets progressively worse as the engine continues to operate. The next chart is used to identify when the system is out of the normal range of operations for engine=123. The bottom chart (MTS Diagnostics Gains) shows the Gains values for each of the 24 input variables in a heatmap. The Taguchi orthogonal array process computes variable importance. This chart is used to identify what variables contribute to the fault conditions. Notice that as the cycles advance down along the y-axis, the gain intensity increases for inputs X9 and X15 based on the changing colors of the heatmap. Variable X9 represents TotBypassPressure and variable X15 represents FuelPressureRatio. Note: Variables in this chart are ordered by decreasing overall computed gains for convenience. The code used to create the previous 2 charts is shown below. Casuser.model is the model trained by Proc MTS. PROC MTSSCORE data=casuser.censorscore model=casuser.model; by engine; output out=casuser.outscore gains=casuser.outgains stat=casuser.outstats; run; Now let’s look at another engine. The following example is from engine=135. Notice that during the normal operating conditions for this engine there are two outliers just above the MD threshold of 3. If these anomalies were due to a faulty sensor that was subsequently replaced, these observations could be removed from the training data using the optional outlier statement in proc MTS. For further details, see SAS Help Center: OUTLIER Statement. The Support Vector Data Description (SVDD) technique is used to identify outliers for removal from training data as needed since data is not always clean. The scoring data for engine=135 used in PROC MTSSCORE highlights cycles that exceed the MD threshold value as visible in the first chart below. The second MTS Diagnostics Gains chart shows that input X9 has been identified as a major factor in this fault condition starting at cycle 116 as indicated by the red sections in the heatmap. This provides useful insights useful for ongoing fault diagnosis and engine maintenance. The Mahalanobis Taguchi system is a powerful diagnostic technique providing valuable insights for many scientific and business applications. The Mahalanobis distance metric trained on an overall system operating in a normal fault-free state represents the benchmark or expected environment. Comparing readings from a current operational instance to the benchmark automatically identifies events that deviate from the normal operating conditions that exceed the MD threshold. Further analysis of the data accomplished while scoring the current operational state with the benchmark model produces a ‘gain’ metric for each input variable. Comparing values of the gain metric identifies input variables contributing to the anomalous condition, thereby providing insight into the cause of the anomaly. In my next post, I will describe how to use the MTS System with streaming event data that is processed in real time by the SAS Event Stream Processing (ESP) application. And what’s wrong with comparing apples and oranges anyway? Find more articles from SAS Global Enablement and Learning here.

PeterChristie · ‎03-19-2025

Credit cards are a convenient tool to help you acquire goods and services without carrying lots of cash or writing checks. The lure of wanting something and getting it instantly without considering how to pay for it is also a reason that sooo many credit cards exist. People can easily get into trouble and incur credit card debt at exorbitant interest rates posing a mountain of financial burden that can be extremely difficult to get out of. Would you put your head in the open mouth of a fierce lion? Well, many people have done so without thinking through the consequences as far as using credit is concerned. With credit card agreements often exceeding 15 pages in length, it's little wonder that they are not on top of your reading list. Still, by looking at the fine print of credit card agreements before signing up for one, you can save money when you do have to carry credit charges over several billing periods. For this post I downloaded hundreds of credit card agreements from dozens of financial institutions. The data is generally available from the Consumer Financial Protection Bureau. The documents are PDF files, so to get them into SAS Viya I had to make a connection to the location of the downloaded main folder that had the individual folders and PDFs for each financial institution. There could be multiple credit card agreements for a financial institution. In this example there were over 500 credit card agreements processed. Once the connection to the folder was made in “Manage Data” in SAS Viya, I was able to import all the credit card agreements into one SAS data set with each agreement in its own row. I created a Visual Text Analytics project in SAS Model Studio using the credit card agreement as the text field. To start, I ran a generic pipeline to see what kind of initial results I would get. I enabled the standard concepts check box for this first pipeline to tag nlp (Natural Language Processing) concepts in the documents. Browsing through the documents to get an idea of what to follow up on, I noticed that the documents had quite varying APR rates. I thought it might be interesting to see what APR rates were available for different types of credit cards. While browsing through the documents in the Concepts node, I noticed that there was a mix of banks and credit unions in the data. To limit unintentional bias in my testing, I wanted to see if I could separate the banks from credit unions in the project. I ended up accumulating several custom concepts during my effort to try to fine tune the APR information. Select any image to see a larger version. Mobile users: To view the images, select the "Full" version at the bottom of the page. I started by searching the documents for “APR” and “rate”. I took one of the predefined concepts “NLP percent” and wrote a concept_rule to match any percentages within six tokens of the terms "APR" or "rate". I then noticed that there were percentages in the documents that related to the terms "margin" and "accrue". These terms also impact interest rate charges, so I ended up creating the following concept that I called _RATES_. The portion of the rule after the _c is the value returned upon a rule match, giving me a list of the desired percentages in each document. _RATES_ CONCEPT_RULE:(DIST_6,"_c{nlpPercent}",(OR,"APR@", "rate@")) CONCEPT_RULE:(DIST_6,"_c{nlpPercent}",(OR,"margin@", "accrue@")) I could use the score code from the Concepts node to create a table of matching rates for each document that represents a credit card agreement. This output table could then be used to create a custom report of the frequency or ranges of rates in the document collection in a post processing step. Here is a word cloud of keywords from the concepts node output table created in SAS Visual Analytics. Keywords I thought of a way to separate the credit union documents from the bank documents in Visual Text Analytics without having to pre-process the documents. To do this I started with the basic classifier rule type to identify documents with the string APR. I then created an empty _CREDIT_UNION_DOCS_ rule to use as a landing place. By cleverly using the export modifier with the string “credit union” (all in square brackets), any document matching APR that also had credit union would magically appear in the previously empty _CREDIT_UNION_DOCS_ rule! _CREDITUNION_ CLASSIFIER:[export=_CREDIT_UNION_DOCS_: credit union]:APR The results show 72 matched documents below. Notice this rule doesn’t use any code since its purpose is to only receive the documents from the _CREDITUNION_ rule. Having this information in a custom concept gives me the ability to do further evaluation for other rule matches for this subset of financial institutions. In my next rule I wanted to figure out what the APR rates were just for credit unions. Using the concept_rule type, I was able to combine the previous _CREDIT_UNION_DOCS_ “landing pad” rule with the _RATES_ rule to obtain the APR rates for credit unions only. Using the Boolean “and” operator I was able to reference the other two rule types and using the _C operator I was able to list the credit union docs and identify those within this rule. _RATES_FOR_CUDOCS_ CONCEPT_RULE:(AND,"_c{_CREDIT_UNION_DOCS_}","_RATES_") Of all 72 credit union documents it turns out that 38 of them matched the rate percentages generated by the _RATES_ rule. Instead of selecting output from the _CREDIT_UNION_DOCS_ rule to be highlighted (which matched the string "credit union"), I could have written the rule to highlight the actual rates by putting the _RATES_ rule portion first after the _C operator. This would, perhaps, provided more useful results in my rule match. In the next example I will reconstruct the rule to see what the results look like. I would expect to still have 38 matches. This is what the rule looks like after changing the context being returned. CONCEPT_RULE:(AND,"_c{_RATES_}","_CREDIT_UNION_DOCS_") I like these results better than the previous rule, although some of the APR rates seem to be surprisingly high. This graph from the results of the Concepts node shows the number of documents matched per concept: The additional _BINDING_ rule below can be used as a starting point to further explore this document collection on your own. Remember that you can use the concept rule output table in SAS Visual Analytics to create custom reports or score new documents using the rules you build in this node. I added this final rule to identify language reflecting liability for charges and found some interesting information. For this example, I wanted to capture a larger section of text within a match. First, the PREDICATE_RULE syntax in the first line below selects all text between matched terms, so if ‘binding’ and ‘heirs’ appear anywhere in the same document, all text between the terms is highlighted. This entire phrase will be selected if you score new documents using this rule. Secondly, the CONCEPT_RULE syntax below identifies the term ‘notice’ if it falls within 6 tokens of variations of the word demand (demands, demanded, demanding) or the word pay (payer, paid, pays). _BINDING_ PREDICATE_RULE:(start, end):(AND, "_start{binding}", "_end{heirs}") CONCEPT_RULE:(DIST_6, "_c{notice}", (OR, "demand@", "pay@")) Some matched text from this rule may reflect surprising and unexpected contract terms, pointing out the importance of reading and understanding agreements before signing. Technology is changing the way society operates. How many times have you just clicked 'accept’ and continued on to whatever app you are adding. Hopefully this post will encourage you to be a little more curious about the ‘fine print’ you are actually agreeing to, and even ‘opt out’ of agreements you find disagreeable. I hope this post encourages you to experiment with and gain insight into the powerful capabilities of concept rules. Who knows, it may limit some unnecessary hardships down the road. This technique can be applied to all kinds of documents. Thanks for reading and notice that you didn’t have to accept anything to do so. 😊 Find more articles from SAS Global Enablement and Learning here.

PeterChristie · ‎01-31-2025

SAS Event Stream Processing Studio is a powerful tool for creating and modifying Internet of things (IoT) event stream processing models. This browser-based visual interface allows users to create projects and modify, version, test or delete existing projects. In this post I will describe the use of stateful and stateless indexes in SAS Event Stream Processing model windows. I also illustrate the benefit of reading the warning messages from the generated project test log. Here is an example of an online streaming project that was created using default settings. The project ingests data from pressure sensors in a simulated manufacturing process. Select any image to see a larger version. Mobile users: To view the images, select the "Full" version at the bottom of the page. The train window is configured to use the kmeans streaming algorithm to segment the sensor readings into new clusters every 50 events, and the clustering results are computed in the score window. Source Window Options In the hierarchy of a SAS Event Stream Processing model, the source window is a crucial component. It's where the data enters the model. It can receive incoming events through connectors, which are configured in the source window itself, as well as through adapters, which run remotely to the project. The properties of the source and other windows can be configured to optimize the performance of your model. Index type is one of the key properties that impacts performance. The index type determines how events are stored and processed, and indexes can be either stateful or stateless. Stateful indexes are maintained according to the window's retention setting. The copy, join and aggregate windows use stateful indexes. Since stateful indexes store events in memory (or disk), they use more memory/disk than a stateless index, described next. They are ideal for smaller, ordered data and for huge data that cannot fit into memory, like tables with millions of rows. Pi_RBTREE and pi_HASH are stateful, in-memory index types. Pi_HLEVELDB and pi_HLEVELDB_NC use disk storage. Limitations of disk storage can lead to decreased performance with these index types. Stateless indexes are not maintained. Events just pass through, and the index does not store events. Since events are not stored in memory (or disk) this leads to better performance. The pi_EMPTY index is the only stateless type. It is best for when we don't need to maintain any state because events are simply passing through the window with no updates or deletes. Note: If a source window is followed by copy or aggregate windows that manage updates using a stateful index, you can make the source window stateless without any functional impact. Less memory will be required to run the SAS Event Stream Processing model, possibly improving the performance. The source window is connected to derived windows through directed edges, forming a flowchart-like structure that defines the processing of streaming data. The thunderbolt icon in the pressures window indicates the window is “stateful”. Event records are going to be accumulated in memory while the model is running. This can have adverse impact when the SAS Event Stream Processing model is deployed on an edge device with limited compute capabilities. Interpreting Logs When Testing a SAS Event Stream Processing Project Testing is an integral part of the SAS Event Stream Processing project lifecycle. When running a test, it's important to view the model's test logs. These logs provide valuable insights into the performance and functionality of the model. The logs can be filtered for specific messages, cleared, or exported for further analysis. When interpreting the logs, look for any error messages or warnings that might indicate issues with the model. Also, pay attention to performance metrics, which can help identify any bottlenecks or inefficiencies in the model. Here are some key performance metrics: Connector Current Rate: This metric shows the current rate of events per second for a specific connector. Connector State: This metric indicates the current state of a specific connector, which can be useful for identifying any issues with data input or output. Throttling Time: These metrics indicate the amount of time a window has been throttled. Throttling occurs when a window is temporarily blocked from processing events, often due to resource constraints. There are several related metrics, including the current throttling time for each window, the maximum throttling time across all windows, and the time when maximum throttling occurred. Project State: This metric indicates the current state of the project, which can be useful for tracking the overall status of your SAS Event Stream Processing project. These metrics can be used to identify potential bottlenecks or inefficiencies in your SAS Event Stream Processing project. For example, a high throttling time might indicate that a window is being overloaded with events, which could slow down the overall processing speed. The following warning messages indicate a potential problem with unbounded memory growth due to the stateful index in the Pressures source window. When the source window is set to stateless, the icon and the warning log messages disappear. The choice between stateful and stateless indexes depends on the specific requirements of your SAS Event Stream Processing project and the nature of the data you are processing. It's a balance between processing events efficiently and meeting the project requirements. Best Practices for Creating Efficient Streaming Models Model Design: Start by designing your model at a high level. Identify the source windows, the derived windows, and how they're connected. This will give you a clear picture of the data flow and the processing steps in your model. Window Configuration: Each window in your model has properties that can be configured. Make sure to configure these properties appropriately to optimize the performance of your model. Model Testing: Always test your model before deploying it. Use the testing features in SAS Event Stream Processing Studio to run your model with test data and review the results. This can help you identify any issues or inefficiencies early on. Model Versioning: SAS Event Stream Processing Studio allows you to version your projects. This can be useful for tracking changes and for maintaining different versions of your model for different purposes. Model Deployment: Once your model is tested and ready, it can be deployed for real-time event stream processing. Remember that the model XML can be imported and exported, allowing for flexibility in deployment. By following these best practices, you can create efficient and effective streaming models using SAS Event Stream Processing Studio. Thanks for reading and watch for stateful source windows! Find more articles from SAS Global Enablement and Learning here.

PeterChristie · ‎11-26-2024

In this post, I’ll outline the steps for adding a Grafana plugin to SAS Event Stream Processing (release 2022.07 or later). We’ll use the plugin to create a dashboard that shows the current location of the International Space Station (ISS) in real time with a Geomap. Internet of Things (IoT) applications provide new capabilities and benefits that are only limited by our imaginations. For years, SAS and our customers have been extending the limits of what is possible with IoT. SAS Event Stream Processing is a part of the IoT landscape that many use to build new applications that exploit streaming data. Data dependencies, innovation, experimentation, trying new things are part of bringing businesses into the age of the streaming and connected world of today. Building and testing prototype IoT projects now occurs with increasing frequency. One piece of the analytical toolkit has always been the ability to visualize the stream of data as it flows through the phases of a project. This is where SAS Event Stream Processing Streamviewer fits the bill in my experience. As of Release 2024.07, SAS Event Stream Processing Streamviewer was no longer supported. Users were instead encouraged to explore the SAS Event Stream Processing Data Source Plug-in for Grafana as a replacement. I don’t know about you, but with changes like this, sometimes it helps to have someone share a few pointers to get started with the transition. Grafana may be familiar to you as a tool for visualizing logs and traces such as those produced by Kubernetes environments providing alerts to system administrators. Well, we have integrated it into SAS Event Stream Processing as a visualization tool you can use for testing new projects developed in SAS Event Stream Processing Studio. As you can guess, I happen to be a fan of the now-discontinued Streamviewer when creating new IoT projects with SAS Event Stream Processing. Streamviewer provided a graphic interface you could use to make sure the data flowing through a project matched your expectations as any data transformations, computer vision detection and other advanced analytics were applied. The documentation says that the use of the Grafana plugin is not intended for monitoring IoT projects, but I still find it useful for validating data in new projects. There’s nothing wrong with having a little fun when working on projects and gaining new insights. There is a nice communities post that mentions Grafana as part of a computer vision project. You can check out this example that mentions the plugin. The post also describes the recent SAS Event Stream Processing project package organization structure that makes it so easy to work with SAS Event Stream Processing projects in a Kubernetes environment without having to separately worry about keeping track of your data in persistent volumes PV and persistent volume claims PVC. In this post, I’ll show you a few steps of how to add a Grafana dashboard to easily show the current location of the International Space Station (ISS) in real time with a geomap. To recreate the results shown in this post, you must have SAS Event Stream Processing release 2022.07 or later and the Grafana plugin correctly installed on your environment, which is a straightforward task for an experienced systems administrator these days. For this example, I presume that you have some familiarity with SAS Event Stream Processing Studio, but if you don’t, that’s ok since by continuing to read, you will gain some intuition on SAS Event Stream Processing. I have a completed SAS Event Stream Processing Studio project saved on my ESP Server. This project has three source windows, one of which makes the connection to the streaming data source having the latitude and longitude coordinates of where the International Space Station is in real-time. The source window uses a SAS Event Stream Processing URL Connector to access an API that returns the location information in JSON format. The name of the source window is “iss”, and this name will be used later in the dashboard. Select any image to see a larger version. Mobile users: To view the images, select the "Full" version at the bottom of the page. I run the SAS Event Stream Processing project in test mode to confirm that I am receiving data from the API. This screen shows that the iss project window is receiving streaming data from the space station. (Note: If you want to run this project in Grafana in your environment, be sure to change the output schema for the ISS source window so that the datetime is set to Type: stamp and not date.) You can get to Grafana from SAS Viya in the Streaming Analytics application section: Welcome Experience page. (If you are not running release Stable 2024.11 or later, the choice is visible after you first enter one of the Streaming Analytics applications.) In Grafana, the SAS Event Stream Processing plugin in my environment has been installed and configured with a data source that connects to the SAS Event Stream Processing application. There is a “Build a dashboard” button on this window. I add a visualization to the new dashboard. The sasesp-plugin connects to the ESP Server and can access any of the running public projects and any other projects that I have been granted authorization to. To start the dashboard, I use the plugin to find and connect to the running project and then make some selections deciding whether I want to see a table of the data fields, or some other visualization. Multiple visualizations can be added to a single dashboard. I just had to select the right data source fields at the bottom of the screen to reference the “iss” project and SAS Event Stream Processing window I am interested in viewing. I changed the default Time series view to Geomap and set a few of the listed properties to reference the latitude and longitude fields in the running SAS Event Stream Processing project. I selected location mode > Coords and then added the lat and long fields to the properties panel. I then clicked Apply and Save to add this visualization to the new dashboard. Here is the Geomap. At the time of writing this post, the space station was heading into the Gulf of Mexico and over the eastern seaboard of the United States. The SAS Event Stream Processing plugin is easy to use when testing SAS Event Stream Processing projects, and now that I have worked with it for a while, I’m pleased to say that for those of you that liked working with Streamviewer, I can recommend the plugin as a worthwhile replacement. For anyone interested in trying out this ISS project in your own SAS Event Stream Processing environment, you can get all the details and download a copy of the project from GitHub. This project has more illustrations of SAS Event Stream Processing capabilities and is a good project to run to get some practice with the application. Also, check out this Communities article on Tracking the International Space Station. Thanks for reading! Don’t forget that you can always reach out to your local SAS Representative to ask about using your Education Training points to arrange for private online mentoring sessions on SAS Event Stream Processing or any other SAS initiative you are working on. And remember, with SAS Event Stream Processing the sky’s the limit! 😊 Find more articles from SAS Global Enablement and Learning here.

PeterChristie · ‎10-22-2024

The purpose of this post is to consolidate information available on SAS Viya Workbench in one handy location. First off, what is it? SAS Viya Workbench is a cloud-based application for Python and SAS developers and data scientists that is accessed through a browser. You can run SAS 9 code and SAS Viya analytics within the same environment without having to install either SAS 9 or SAS Viya. SAS Viya Workbench is for SAS programmers who want to access SAS in VS (Visual Studio) Code using their choice of running traditional SAS program files (.sas) or using the SAS Notebook (.sasnb) format with features familiar to users of Python notebooks such as markdown cells with inline output and log displays. It is also for Python programmers who use VS Code or Jupyter Notebooks. Select any image to see a larger version. Mobile users: To view the images, select the "Full" version at the bottom of the page. In SAS Viya Workbench, users can easily manage their own environment without additional IT support. Users can also choose the compute power to match the needs of the project. The compute power can be scaled up or down by the user with a simple restart of the workbench to make sure you only use the resources needed for your specific analysis task. This can reduce the cost of running workbench since cloud providers charge by the resources used. No need to be concerned about resizing your environment since all work is saved between restarts of the environment so that nothing is lost. After a restart, you pick up where you left off in the previous session. Here are some capabilities of SAS Viya Workbench. You might want to think of SAS Viya Workbench as a convenient standalone cloud-based application that can also be a stepping stone towards a full SAS Viya environment if-and-when enterprise capabilities are needed. Additional Viya enterprise capabilities that you may want are shown in the chart below. A really good way to get a taste of using Workbench is to view the quick start tutorial youtube videos. I highly recommend you watch the 12-minute Getting Started video from this series of Quick Start Tutorial videos. The sasviya.ml package We also provide you with a repository of Python notebooks, SAS programs and sample data for use in SAS Viya Workbench. SAS9 users can use their code and experiment with new Viya procedures that can coexist and run in workbench without requiring a full Viya environment. The SAS procedures run multi-threaded where it makes sense, providing performance benefits that reduce the amount of time and cost required to run your analysis. Python users can also benefit from using SAS Libraries that offer performance improvements over traditional libraries. SAS wrote the sasviya.ml package for Python users to run in SAS Viya Workbench. This is from a recent post: The sasviya.ml package is designed to integrate with scikit-learn, so the sasviya.ml model objects have the same syntax and most of the same functionality as the equivalent scikit-learn model objects. The major difference is in the execution, the models in the sasviya.ml package execute using optimized SAS libraries, which takes advantage of multithreading and can yield much faster runtimes than the equivalent models in scikit-learn. Examples These examples provide an excellent framework you can modify to quickly run your own data through machine learning algorithms. In addition to data mining and machine learning procedures, SAS Viya Workbench provides procedures for sampling, data exploration, clustering, dimension reduction, model assessment, computer vision image processing and supervised learning. Additional Resources are available in the SAS Viya Community where interesting and relevant posts are frequently updated by community members. Examples of community topics for Python and SAS users include tips such as data access, machine learning, and using Git for source control. E-Learning In collaboration with my colleagues, we developed a free E-Learning training course on SAS Viya Workbench that is now publicly available on https://learn.sas.com. Simply log into your profile, search for the course Modern Data Science with SAS® Viya® Workbench and Python and activate it to get started. Product Documentation The official product documentation has all kinds of useful information including example use cases, data access techniques, programming examples and more. Why use SAS Viya Workbench? Cost-efficiency: Combined with cloud providers, you can manage computing costs by user, project and department, scaling resources as needed to only pay for what you use. Ready-to-use environments using SAS code or Python through Visual Studio Code or Jupyter Notebook interfaces. Projects can be shared using Git for version control and collaboration. Improve performance by using multi-thread supported SAS libraries. There is lots of excitement around SAS Viya Workbench, and I think these links will help guide you to the resources you need to keep you in the know. Thanks for reading and I hope you enjoy exploring SAS Viya Workbench! Find more articles from SAS Global Enablement and Learning here.

PeterChristie · ‎08-26-2024

Imagine that you have an enormous collection of transcribed phone calls from bank customers as your data source. If you were to ask someone to identify calls related to ‘accounts’ in a bank, what do you think they would say? Sure, some may simply tell you they’re not interested in whatever you’re selling. Others may indulge your question and provide a thoughtful response possibly suggesting that you: Look for savings, checking, account balance, and interest rates. Search for overdraft, insufficient funds, closed account. Don’t forget the terms: fraud, check, or any word containing ‘cash’. With a bit of work, we can come up with convincing search arguments that seem to do a pretty good job of identifying the concept we are looking for. Even with well-crafted arguments, it is possible that some relevant documents remain unidentified. Is there anything else that can improve our information retrieval searches when we have run out of ideas based on typical human thinking patterns? Select any image to see a larger version. Mobile users: To view the images, select the "Full" version at the bottom of the page. Refer to my previous post where I show how to use SAS Visual Text Analytics to automatically generate concept rules that can build on our research related to ‘accounts’. In this post, I want to highlight some less-obvious available syntax for constructing various types of concept rules. This screen capture from my previous post shows a handful of generated rules. Think of these as a source of concept rule ideas you can experiment with and use in rules that you have already written. Let’s consider how we can use these for inspiration to create specialized rules that can be useful in many different scenarios. What do you think of this rule for starters? We can use it to determine what actions have been taken regarding user accounts. C_CONCEPT:_c{:V} my account The C_CONCEPT rule returns the _c portion of the rule when a complete rule match is found in a document. This rule looks for a part of speech :V (verb) immediately followed by the term ‘my account’. This rule returns any actions applied to a user’s account. The action will likely be a verb such as opened or closed, but we might discover some unusual or unexpected actions that provide new insight into our customer data. This information can then be used to identify trends in the number of specific actions that have been taken against user accounts over time. (A SAS Visual Analytics dashboard using the scored concepts is a good way to report on this information). Different parts of speech can be substituted in this type of rule depending on the information you are looking for. Nouns, verbs, adjectives, prepositions and determiners are examples of parts of speech that can be identified by the Natural Language processing capabilities in SAS Visual Text Analytics. This table lists the available parts of speech for all supported languages. Here are a few examples of additional automatically generated rules to ponder. C_CONCEPT:_c{:N} from :PN This rule highlights a noun :N followed by ‘from’ followed by a proper noun :PN (upper case noun). This rule can identify interactions between parties that may be of interest in a business context. Here are some matches for this rule from my data collection of customer complaints. Matches: retroactive spousal payments from Department of Veterans Affairs contacted so many attorneys from New Jersey to Georgia to pursue a case After hearing from IRS The next example looks for the term fraud in the same sentence as any monetary amount found by the predefined nlpMoney concept. The elements in this rule can appear in any order as long as they are in the same sentence. Combining predefined concepts with specific information you are looking for can be very powerful. CONCEPT_RULE:(SENT, "_c{Fraud}", "nlpMoney") Here are three returned matches from this rule. Bank 's Fraud Department regarding an online purchase of over {$1500.00} in which I stated… …was victim on fraud in the amount of {$18000.00} on XX/XX/2022… Overall, I lost over {$13000.00} to this fraud and I really want my money back. Next is a rule that requires items to appear in a specific order for a match to occur. It looks for any word form of either fraud or scam within 5 terms of a proper noun. CONCEPT_RULE:(ORDDIST_5 , (OR, "fraud@", "scam@"),"_c{:PN}") Matches: I was scammed. The FBI is now involved.. I filed a fraud report with local XXXX Sherrif 's police department… Here is a generated rule looking for items that occur in any order in a sentence. This rule combines a predefined concept with a morphological expansion (the @ symbol syntax) of the term advance. (any form of the term 'advance') CONCEPT_RULE:(SENT, "_c{advance@}", "nlpMeasure") Matches: It was not until two days later ( XX/XX/18 ) that I found out that they transferred money from my own credit card as a cash advance to make me believe it was money from somewhere else. mailing in my payments 1 month in advance We will send you an advance notice approximately seven days before we reapply the negative balance. There were 75 cash advances If you are looking for any mention of an action, this rule identifies what’s going on in your documents by looking for a verb before ‘that’ followed by a noun. Here are some examples. C_CONCEPT::V that _c{:N} Matches: lose that amount of money. stop that retaliation I sent that money protect that data or delete that data I would not have thought of these rules were it not for the automatic generation capability. Consider incorporating some of these ideas in your concept rules to see if it improves your information retrieval success. Who knows, if your current rule ain't broke, you may make it better! Thanks for reading and I wish you much success with your text analysis adventures. Find more articles from SAS Global Enablement and Learning here.

PeterChristie · ‎08-12-2024

If you look at the score code generated in your test environment, there will be a comment around line 2 starting with * PSCORE TIMESTAMP: Proc PSCORE creates the SAS score code from the PMML xml file. It looks like your development environment may not be configured to support running proc PSCORE. Check with your administrator to confirm. As a work-around, you can manually run the proc to create the score code. This reference provides some additional information. SAS Help Center: Concepts: PMML Support

PeterChristie · ‎07-17-2024

Hello! You can export the terms table by running the tpAccumulate action from SAS Studio. An example of the code is in the SAS Visual Text Analytics Programming Guide. The Keep column in the output will either have a Y or N. Any term with N is considered to be in the dropped terms list. SAS Help Center: Accumulate Term Information Using the tpAccumulate Action

PeterChristie · ‎07-12-2024

The purpose of this post is to describe how to enhance concept rules to retrieve better focused information from a collection of documents. We'll explore concept rules and ideas for refining them in the context of an analyst processing a large number of customer complaints at a large bank. Better concept matches start with interactive exploration in SAS Visual Text Analytics. For an introduction to the capabilities in SAS Visual Text Analytics see my post. Concepts are used to extract relevant information from documents. This analytical journey starts with exploratory analysis. When processing customer complaint documents, concepts help us: identify conditions where someone needs help, identify opportunities to improve customer service, discover social media comments to address before a complaint goes viral, and more. First some freebie tips. Select any image to see a larger version. Mobile users: To view the images, select the "Full" version at the bottom of the page. Predefined concepts are automatically identified by checking a box in the properties panel. These use Natural Language Processing (NLP) techniques to identify concepts such as measures, money, organizations, people, places and more. The nlpMeasures concept for example, returns matches for terms such as (2 years, two days, 5 mg, $250., 8 feet). The nlpMoney concept matches anything related to currency such as ($2, 2.50€, 40 yen, 20 pounds). You can combine predefined concepts with new custom concepts to easily identify documents having specific information. Keep these freebies in mind to assist you with building your own concepts. For example, the rule argument (SENT,”_c{cashed}”, ”nlpMoney”}) returns matches for any document that has the term “cashed” and any currency amount in the same sentence. SAS will recognize currency whether it is U.S. dollars, British pounds, etc. Your mission… Let’s consider an example scenario. You work for a bank and one of your responsibilities is to evaluate 10,000 recent customer complaints that came in through various channels and are available in computer readable text. Your mission is to investigate complaints related to the handling of cash and recommend actions your company might take to improve the customer’s experience. What is the first thing you would try? Exactly, Visual Text Analytics! Open the documents in SAS Visual Text Analytics, run the text parsing node and look at the kept terms and list of documents. You might then try searching the documents for the word cash and see if this gives you any ideas. Your search returns 700 documents and you see phrases in the documents like: Cash a check Cash advance Cash back Cash price Cash deposit / withdrawal Remember these pointers to structure an effective search: Placing the "+" in front of a word requires the word to be in a document. A term without a preceding + returns the term if it exists in the document but it does not have to be in the document. A "–" in front of a word finds documents that do not have that term. Place a * at the beginning, middle or end of a search query to get wildcard matches. The query term +*ion returns terms such as exception or action. Quotation marks return the specific content in between such as “fraudulent check”. You might then shift your attention to the kept terms list at the top and search for cash again and notice the following. In this display, cash is recognized as a noun, verb, proper noun, and in a noun group by the NLP process. You decide to look at the verb cash in the term map next. This map shows the terms related to cash based on information gain. One path from this chart suggests that some documents contain (cash, check, issue) and do not contain (offer or deposit) where the ~ sign indicates “not”. These ideas represent new insights to consider as you construct new custom concept rules. In the parsing node you can click the similarity scores icon, remove “cash” from the kept terms filter, and notice the additional new terms that appear. “Check fraud” and “forged check” didn’t initially come to mind when exploring cash, but you may want to expand your investigation for occurrences of fraud related to cash! Concepts Node Next on your journey, you add and run a concepts node to identify predefined concepts in the documents and then create a custom concept looking for various morphological expansions of the term “cash”, with fraud and check. The initial LITI (language interpretation for textual information) CONCEPT and CLASSIFIER rules follow. These results look promising, but a lot of documents matched. You wonder if there might be some additional ideas for rules. Click the Autogenerate concept rules icon in the concept that has just 3 simple rules. The fresh new rules were written to the sandbox. The generated rules will need more explanation especially if the LITI syntax is new to you. We plan to cover some of these in the next post, but here is a sample of the autogenerated rules in the meantime. After building concept rules, the resulting concepts model can score the current or new documents for concept matches. SAS Visual Analytics can also be used to report on the results. If you can’t wait until the next post to make sense of these rules, reference this documentation, otherwise, I look forward to sharing some more insights with you in the next post! It's your move! Find more articles from SAS Global Enablement and Learning here.

PeterChristie · ‎05-20-2024

A customer recently posed this question: “I know how to match a concept at the end of a sentence, but how can I match it at the conclusion of a document?” (For a review of concepts and other basics, see my previous post.) The purpose of this post is to match concepts near the end of a document using a technique that leverages multiple nodes. LITI is Language Interpretation for Textual Information, and it is the programming language used to code concept and category rules in SAS Visual Text Analytics. Customer complaint call transcripts can end up being quite verbose. Sure, for starters, you could run a summary of the documents to get the main ideas being raised, and this would be a good thing to do. (see my post on summarizing documents). But isn’t there something else I could also try? The complete customer interaction likely makes some progress over its duration. The follow-up actions or conclusions of the dialog can be expected to occur towards the end of the event, after the back-and-forth discussion has become more focused on the real issues. It may not be necessary to read through the entire interaction to gain some actionable insights. You may be most interested in returning the matches for certain conditions that occur in the last 50 terms or so of an exchange between the customer and the call agent. The text near the close of the conversation likely holds information to help you determine any appropriate followup actions. Designing the structure of your rules thoughtfully can provide meaningful insights. You could also create a dashboard of these conclusions over time for extra credit. Consider this example where a Concept rule finds matching terms occurring in a document based on a list of conditions and activities that are stored in supporting rules. In the consumer financial services area, you might have a concept rule containing LITI syntax like this: CallCenterMatches CONCEPT_RULE:(AND,(DIST_20,"_CONDITION_", "_c{_ACTIVITY_}")) This rule matches documents that have an item from the Condition rule within 20 tokens of an item from the Activity rule. A match requires that at least one item from each table is in the document due to the use of the AND operator. Two example tables below use the LITI language syntax for a table of conditions and a table of activities. The items from the columns can appear in any order. The @ symbol indicates morphological expansions of the term (all forms of a term like report, reports, reported, reporting…) Select any image to see a larger version. Mobile users: To view the images, select the "Full" version at the bottom of the page. Here are some of the 635 matches from SAS Visual Text Analytics of the CallCenterMatches rule with the activity name highlighted as the rule requested. OK, this is great, but we already know how to use the CONCEPT_RULE and all its bells and whistles. What else can I do? What if you have a long call transcript and only want to know what happens towards the end of the call? This kind of focus may do a better job than our CallCenterMatches rule on its own to identify followup actions with the customer. To do this, we’ll add a category rule to find a concept if it exists at the end of the interaction (call or chat or transcribed audio) Next, we will create a category rule based on the concept and use an easily overlooked statement available in the Category node. I can connect a categories node directly to a concepts node as shown in the right-hand branch of this pipeline, since both nodes can do their own parsing. I added the following Categories rule so I could see only the matches at the end of the document. The syntax of the Categories node is different than the Concepts node, so this syntax is not valid for use in the Concepts node. I created my new Category rule using this syntax: (END_40, "[CallCenterMatches]") The square brackets in double quotation marks tells the Categories node to use a concept rule rather than a term in the rule. After applying the concept matches to the categories node in the last 40 tokens of these call reports, there are only 190 matches whereas previously there were 635 document matches in the concepts node. This is the end of one of the matched documents with the highlighted conditions and actions terms within the final 40 tokens! This is the description of the END_n Categories rule. END_n (From the end of the document) Takes a value for n and one or more arguments. Matches if each argument occurs within n tokens from the end of the document. For example, the rule (END_35, "conclusion") produces a match if conclusion is found within 35 words from the last token in the document. Note: Punctuation counts as a token. In some languages, words that include hyphens are counted as one word (for example, merry-go-round is one token). So, there you have it! You can combine concept rules within categories to get specific matches based on your needs. There are many other hidden treasures in the land of LITI rules and who knows, I may reveal some more in posts yet to come!!! Thanks for reading and let me know if you use a form of this rule. BTW, If you really want to learn this stuff, here is an exercise for you to try. See if you can identify what each of these rules will return. Do some rules not give you what you are really looking for? Do any of these rules return more than the information we are looking for? Read about the DIST_n, AND, OR, and _c operators for Concept rules for hints. CONCEPT_RULE:(DIST_20, "_c{_CONDITION_}", "_ACTIVITY_") CONCEPT_RULE:(AND, "_c{_CONDITION_}", (OR, "_ACTIVITY_")) CONCEPT_RULE:(OR,(DIST_20, "_c{_CONDITION_}", "_c{_ACTIVITY_}")) CONCEPT_RULE:(AND,(DIST_20,(OR,"_c{_CONDITION_}", "_c{_ACTIVITY_}"))) Find more articles from SAS Global Enablement and Learning here.

PeterChristie · ‎04-09-2024

Hello, and welcome to my blog! The purpose of this blog is to encourage you to experiment with some of the options that are available for taking your Text Topic Modeling beyond the basics. If you are new to Visual Text Analytics, see my article on Getting Started with Text Analytics to get an idea of the capabilities. Starting with a collection of documents, these can include customer call reports, patients’ comments, product reviews or movie recommendations, SAS provides the Natural Language Processing (NLP) capabilities that can turn the documents into useful insights. The Topics node is usually inserted near the end of a text processing pipeline. This node provides you with controls that regulate the topic discovery process. Controls include setting cutoff thresholds for terms and documents. A lower threshold captures more items while a higher threshold captures fewer items. The threshold is based on the number of standard deviations above the mean term or document weight. Select any image to see a larger version. Mobile users: To view the images, select the "Full" version at the bottom of the page. Note that when running a Topics node, some documents will likely not fit into any of the generated topics, and they will be accounted for in a group designated as “No Matching Topic”. Re-running the node with lower thresholds specified will result in fewer documents falling into the “No Matching Topic” group. These settings provide users with adequate controls for extracting reasonable topics when getting started with a text analysis project. Machine generated topics can be combined or split, and additional user-created topics can also be added. To illustrate this point, I ran the topics node with the default settings, and it returned 11 topics. The right-most bar represents documents that did not fit into any discovered topic. I then reduced the document cutoff control setting from 1 to 0.7 standard deviations, and now more documents are accounted for in the generated topics as shown below. The 5th bar from the right in the chart below represents the reduced number of documents with no matching topic. A good approach is to examine the results you get by spot checking the documents in your topics and then make adjustments to the thresholds until you are happy with the settings. The Latent Dirichlet Allocation (LDA) Topic Model The results above were generated under the default Singular Value Decomposition (SVD) algorithm also known as latent semantic indexing (LSI). If you wish to experiment with generating topics using a different algorithm, the Latent Dirichlet Allocation (LDA) Topic Model technique is also available. Running LDA for topic generation may produce different topics and document assignments. It utilizes Bayesian and Expectation Maximization techniques. Now, let's discuss how we'd implement it by calling a CAS Action in syntax. Implementing the Latent Dirichlet Allocation algorithm: ldaTopic.ldaTrain action creates topics from a document collection using just a document ID, input table, and the variable name containing the documents at a minimum. It defaults to creating only 2 topics, but this can be changed using the k= parameter. In my movie description example, I asked for k=5 topics to be created. From the documentation, the minimum syntax needed to run with default settings is shown here. This code would run in SAS Studio. /* Connect to CAS and load the data table. */ options casport=5570 cashost="cloud.example.com"; /**/ cas casauto; caslib _all_ assign; proc cas; /**/ session casauto; table.loadTable result=r / /**/ caslib="Library_name_to_use" path="Path_to_my_file" casOut={name="CAS_file_name", replace=true}; run; /* Run the ldaTrain action – this section defines the output table names */ ldaTopic.ldaTrain / /**/ casOut={name="Topics", replace=TRUE } docDistOut={name="DocDist", replace=TRUE } /* This section defines the input table name, id variable and text variable */ docId="D_ID_variable_name" table={name="Table_name"} text={{name="T_variable_name"}}; run; quit; After running code based on this outline, the following output table shows the proportion of terms for each topic that was found in a document. There were 5 topics generated, numbered 0-4, and each document id has a proportion value for each topic. The following output table shows what topic (0-4) each term was placed into as well as the assigned probability the term will represent the specific topic. As I mentioned before, I added a parameter k=5 to return 5 topics rather than the default 2 topics when I ran against a movie description document collection. The hyperparameter Alpha with a minimum value of 0 and a default value of .1 specifies the Dirichlet hyperparameter for the document's topic proportion. Beta specifies the Dirichlet hyperparameter for the topic distribution. Additional optional parameters described in the product documentation let you do additional configuration including adding stop words, modifying the number of iterations, entity identification and stemming. You can run LDA action sets from CASL, PYTHON, R and Lua, as well as from a SAS STUDIO task for those of you who might prefer to work with a user interface. Let’s take a quick look at SAS Studio for inspiration. On the left you see the text parsing and topic discovery selection. In the center, the Options tab is selected as is the LDA radio button. On the right you see the generated code calling the ldaTopic.ldaTrain action. Even if you are a ‘coder’, the SAS Studio task is a great way to quickly get a working copy of code showing correct syntax and the placement of additional options. The SAS Studio tasks are straight forward easy to use and can be an inspiration on your way to becoming a text analytic practitioner! In a subsequent blog we will explore scoring these models and ways to deploy them in a production environment. Until then, I wish you much success with your analysis and may all your topics be true!

PeterChristie · ‎02-12-2024

Hello! I'll take a swing at this one. In Visual Text Analytics on Viya, to see sentiment information you have to score the data. Download the score code for the Sentiment node. Copy the code and open it in SAS Studio. Enter the correct values for the first 5 macro variables such as data, key column, caslib. The OUT_SENTIMENT and OUT_SENT_MATCHES tables that get created by the scoring process will have your detailed sentiment information. Your list of column names looks like additional input columns from your input data source that are visible in the text topic node and are not part of the sentiment analysis process. Hope this makes sense!

Online Status	Offline
Date Last Visited	2 weeks ago

There was a woman who lived in a shoe, she had so many CONCEPT rules s...

Extract and visualize streaming data from an array with SAS Event Stre...

Introducing a new diagnostic technique for your SAS Event Stream Proce...

Breaking news! Learn about our latest analytical, fault-detection and ...

Ever wonder whats in those credit card agreements you never read? Let...

Optimizing SAS Event Stream Processing Studio Usage

Hey! What happened to SAS Event Stream Processing Streamviewer? Can I ...

Check out a few of my favorite things about SAS Viya Workbench!

If somethin’s broke, I can fix it. If it ain’t broke, I’ll make it bet...

Re: Import model in Sas Model Manager with PMML and get error missing ...

Tips and Tricks for Power Users of SAS® Visual Text Analytics: Part 3 ...

Tips and Tricks for Power Users of SAS® Visual Text Analytics: Part 1 ...

Tips and Tricks for Power Users of SAS® Visual Text Analytics: Part 2 ...

Creating a unique ID with CAS DATA Step

How to Create Enterprise Wide Model Dashboards with SAS Viya

Re: SAS Visual Text Analytics on SAS VFL

Re: SAS Visual Text Analytics on SAS VFL

Re: Having multi-class events, how do I assess the misclassification r...

Re: Jupyter lab: unable to install pyhton mlxtend

There was a woman who lived in a shoe, she had so many CONCEPT rules s...

Extract and visualize streaming data from an array with SAS Event Stre...

Introducing a new diagnostic technique for your SAS Event Stream Proce...

Breaking news! Learn about our latest analytical, fault-detection and ...

Ever wonder whats in those credit card agreements you never read? Let...

There was a woman who lived in a shoe, she had so many CONCEPT rules s...

Extract and visualize streaming data from an array with SAS Event Stre...

Introducing a new diagnostic technique for your SAS Event Stream Proce...

Breaking news! Learn about our latest analytical, fault-detection and ...

Ever wonder whats in those credit card agreements you never read? Let...

Optimizing SAS Event Stream Processing Studio Usage

Hey! What happened to SAS Event Stream Processing Streamviewer? Can I ...

Check out a few of my favorite things about SAS Viya Workbench!

If somethin’s broke, I can fix it. If it ain’t broke, I’ll make it bet...

Re: Import model in Sas Model Manager with PMML and get error missing ...

Re: SAS VTA 8.5: How to export "Text Parsing" node Kept terms list an...

Need some inspiration for better SAS Visual Text Analytics concept mat...

The LITI rule for Text Analytics that you didn’t know you needed…until...

Text Topic Modeling

Re: Sentiment Analysis node in Text Analytics