Hello, I am trying to load a sparse data set into SAS Enterprise Miner 12.3 in order to analyze and run models on it. The entire data set is about 2.4 million observations and 3.2 million attributes (I downloaded the data from UCI Machine Learning Repository: URL Reputation Data Set). However, the data set is broken up into files of 20,000 observations by 3.2 million attributes. Each file is structured in svm-light format. For example, the following are examples of two potential rows/observations: -1 2:0.9345 5:0.4234 10:0 ... 3231961:0 1 3:0.3332 5:0.5232 12:1 ... 3110232:0 Where the first column will either be +1 or -1. The remaining columns have the form attribute_index:attribute_value. Note for example that in the first row, attributes number 2 and 5 are part of this observation. However, in the second observation, attributes 3 and 5 are included (and not 2). When I load the data, for each row, I need the table to represent all 3.2 million attributes either with the attribute values or with zeros. I asked this question a while ago and someone kindly provided a solution: However, my concern is two-fold: is this the most effective method to use in Enterprise Miner 12.3 Is there a sparse representation in SAS? In C++, I can use the Map class to represent this data. Thank you in advance and I apologize if I was not very clear.
... View more