Hi, There are several ways to attack this problem with Enterprise Miner. I think they will all start with coercing your data into COO format, a transactional format much like the format in which your raw data is stored. See these two references for explanations and SAS code relating to COO format data: - http://support.sas.com/resources/papers/proceedings14/SAS195-2014.pdf - http://support.sas.com/resources/papers/proceedings14/SAS313-2014.pdf As the first paper describes, HP Text Miner is probably your best option as it allows for advanced modeling using COO format data directly. If you do not have access to HP Text Miner, I would suggest the strategy outlined in the "EXAMPLE 1: SUPERVISED LEARNING WITH THE KAGGLE EMC ISRAEL DATA SCIENCE CHALLENGE DATA" section of the second paper. In short: - Convert your raw data into a COO set in SAS. - Use the COO format set to find the N most dense features. In COO format, each line of the data set is a tuple representing {row, column, value}. Sorting a COO set by column allows you to count the number of non-zero values in each feature. It is very likely that the features with the highest numbers of non-zero values will be important predictors. - Use the modeling algorithm of your choice on the appropriate number of selected features. Both the first and second papers provide code that will be similar to, but certainly not exactly, what you need.
... View more