About ggramajo

ggramajo · ‎04-04-2014

Hello, I am trying to load a sparse data set into SAS Enterprise Miner 12.3 in order to analyze and run models on it. The entire data set is about 2.4 million observations and 3.2 million attributes (I downloaded the data from UCI Machine Learning Repository: URL Reputation Data Set). However, the data set is broken up into files of 20,000 observations by 3.2 million attributes. Each file is structured in svm-light format. For example, the following are examples of two potential rows/observations: -1 2:0.9345 5:0.4234 10:0 ... 3231961:0 1 3:0.3332 5:0.5232 12:1 ... 3110232:0 Where the first column will either be +1 or -1. The remaining columns have the form attribute_index:attribute_value. Note for example that in the first row, attributes number 2 and 5 are part of this observation. However, in the second observation, attributes 3 and 5 are included (and not 2). When I load the data, for each row, I need the table to represent all 3.2 million attributes either with the attribute values or with zeros. I asked this question a while ago and someone kindly provided a solution: However, my concern is two-fold: is this the most effective method to use in Enterprise Miner 12.3 Is there a sparse representation in SAS? In C++, I can use the Map class to represent this data. Thank you in advance and I apologize if I was not very clear.

ggramajo · ‎12-02-2013

Thank you very much Tom. I will give your solution a try very soon. And to your point, I should have elaborated on the data more. This data set is a 20000 x 3231961 matrix that categorizes websites as either benign or malicious. Each row represents a website and the +3 million columns describe website features. The first column of +1/-1 (the response) is there for classification purposes. This indicates whether the website is malicious or benign The remaining columns are a combination of categorical {0,1} and real-valued features. Thank you again, Gary

ggramajo · ‎12-02-2013

Dear SAS community, My issue: I am trying to upload a data file of type svm-light. I have attached an example of such a file (which I downloaded from the UCI Machine Learning Dataset). I would like to upload such a file into SAS but I am at a loss as to how to do this. What I intend to do with the data set: I would like to analyze this dataset using GAM. I pretty sure the data is provided in this svm-light form because the data matrix is extremely sparse. I googled this topic in various ways and could not find a solution. I sincerely apologize if this has already been solved, and I missed it. Thank you in advance

Online Status	Offline
Date Last Visited	‎09-01-2015 07:11 AM

Loading Sparse Data Into SAS Enterprise Miner 12.3

Re: Loading svm-light files in SAS

Loading svm-light files in SAS

Loading Sparse Data Into SAS Enterprise Miner 12.3

Re: Loading svm-light files in SAS

Loading svm-light files in SAS