10-26-2016 03:04 AM
I am new to programming in SAS. As part of a school project, i was tasked to do data cleaning for the following files: https://www.kaggle.com/c/santander-customer-satisfaction/data?
The data has many variables that have no meaning (i.e. consists of only 0 values). I am wondering how can I write a code to remove these variables automatically? Do note I am completely new to SAS programming and I have tried to modify other solutions to fit my problem but to no avail. Thank you in advance.
10-26-2016 03:14 AM
I Can't open any of the data you are linking to, can you post some sample data?
The variables you would like to check for a 0 value are numeric right?
10-26-2016 03:16 AM
It has over 100 columns with no serialized naming. Yes they are numeric. I am just not sure how to loop through each variable in SAS!
Thanks for your help
10-26-2016 03:52 AM - edited 10-26-2016 03:53 AM
Try something like this
proc iml; use THENAMEOFYOURDATASET; read all var _ALL_ into X[colname=varNames]; close THENAMEOFYOURDATASET; cols = ncol(X); sums = j(1,cols,0); do j = 1 to cols; sums[j] = sum(X[ ,j]); end; idx = loc(sums > 0); newX = X[ ,idx]; create MyData from newX[colname=varNames]; append from newX; close MyData; QUIT;
Where of course you insert your dataset name instead of
10-26-2016 05:14 AM
Upon closer inspection, the data was actually modified and it removed variables who's sum is not zero.
What I want to achieve is to delete an ENTIRE variable IF all its values are zero. I am wondering if you could me with that?
10-26-2016 06:48 AM
data have; input x $ y z; cards; x 1 0 y 0 0 z 0 0 ; run; data _null_; set sashelp.vcolumn(where=(libname='WORK' and memname='HAVE' and type='num')) end=last; if _n_=1 then call execute('proc sql;create table temp as select '); call execute(cat('sum(',name,' ne 0) as ',name)); if last then call execute('from have;quit;'); else call execute(','); run; proc transpose data=temp out=temp1; run; proc sql noprint; select _name_ into : list separated by ' ' from temp1 where col1=0; quit; data want; set have(drop=&list); run;
10-26-2016 06:59 AM
data have; input x $ y z zz; cards; x 1 0 0 y 0 0 0 z 0 0 0 ; run; proc iml; use have; read all var _num_ into x[c=vnames]; close; y=vnames[loc((x^=0)[+,]=0)]; submit y; data want; set have; drop &y; run; endsubmit; quit;
10-26-2016 07:15 AM
When you select which fields to remove, note that a sum of 0 is not a guarantee that all values are zero. You would be better advised to use PROC SUMMARY to get the min and the max, and make sure both the min and the max are zero.