Hi, I am doing the document categorization in the SAS text miner 12.1. After the training process done in the text rule builder, I got the following part of rules in the Content Categorization Code window and the Rules Obtained Results windows, F_L2 =Social :: (OR , (AND, (OR, "facebook" , "fb" ), (NOT, ("book" , "booking") )) , (AND, (OR, "friend" , "friends" ), (OR, "" , "noticed" , "notice" )) F_L2 =Ads :: (OR , (AND, (OR, "cheapest" , "cheap" , "cheaper" )) , (AND, (OR, "call" , "calls" )) , (AND, (OR, "price" , "pricey" , "prices" ), (OR, "higher" , "high" , "high" )) , "ripoff" , (AND, (OR, "well price" , "best price" )) "Call" ------------------------------------- Target Value True Positive/Total Remaining Positive/Total Rule Social 5/11 60/3,244 facebook & ~post Social 2/4 55/3,233 Twitter Social 4/12 53/3,229 friend & notice Ads 11/12 226/3,217 cheap Ads 151/229 215/3,205 call Ads 7/9 64/2,976 price & high Ads 3/4 57/2,967 ripoff Ads 3/5 54/2,963 price Ads 6/9 51/2,958 call The SAS CC rules on the upper side match with the rules in the bottom side because they are the same rules. My questions are as follow, er Why are there two rules of “Call” in the Ads categories? They are almost the same rules so why SAS did not combine they together as one rules. I guess it has something to do with the scoring order? Say, now I have to use these rules to score new dataset by writing in different language such as Java or Python. I guess the logic of these rules are like: If document.text contains (“facebook” or “fb”) and not contains (“book”or “booking”) then document.target= “Social” else if… Does the original orders of the rules matter in this case? Can SAS CC rules be used outsides of SAS? Thanks, Eric
... View more