Deleting variables when all values are zero

Reply
Occasional Contributor
Posts: 5

Deleting variables when all values are zero

Hi everyone,

 

I am new to programming in SAS. As part of a school project, i was tasked to do data cleaning for the following files: https://www.kaggle.com/c/santander-customer-satisfaction/data?

 

The data has many variables that have no meaning (i.e. consists of only 0 values). I am wondering how can I write a code to remove these variables automatically? Do note I am completely new to SAS programming and I have tried to modify other solutions to fit my problem but to no avail. Thank you in advance.

PROC Star
Posts: 551

Re: Deleting variables when all values are zero

Hi Smiley Happy

 

I Can't open any of the data you are linking to, can you post some sample data?

 

The variables you would like to check for a 0 value are numeric right?

 

Regards

Occasional Contributor
Posts: 5

Re: Deleting variables when all values are zero

Hi,

 

http://imgur.com/a/AtRzX

 

It has over 100 columns with no serialized naming. Yes they are numeric. I am just not sure how to loop through each variable in SAS!

 

Thanks for your help

Imgur: The most awesome images on the Internet.
PROC Star
Posts: 551

Re: Deleting variables when all values are zero

OK so they are all numeric? no character variables?

Occasional Contributor
Posts: 5

Re: Deleting variables when all values are zero

Yes that is right
PROC Star
Posts: 551

Re: Deleting variables when all values are zero

[ Edited ]

Try something like this

 

proc iml;
   use THENAMEOFYOURDATASET;
      read all var _ALL_ into X[colname=varNames]; 
   close THENAMEOFYOURDATASET;
   
   cols = ncol(X);

   sums = j(1,cols,0);

   do j = 1 to cols;
      sums[j] = sum(X[ ,j]);
   end;

   idx = loc(sums > 0); 

   newX = X[ ,idx];
   
   create MyData from newX[colname=varNames];
      append from newX;
   close MyData;

QUIT;

Where of course you insert your dataset name instead of  THENAMEOFYOURDATASET.. Smiley Happy

 

Occasional Contributor
Posts: 5

Re: Deleting variables when all values are zero

hi draycut,

 

thank you very much it worked!

 

Occasional Contributor
Posts: 5

Re: Deleting variables when all values are zero

Hi draycut,

 

Upon closer inspection, the data was actually modified and it removed variables who's sum is not zero.

What I want to achieve is to delete an ENTIRE variable IF all its values are zero. I am wondering if you could me with that?

 

 

Super User
Posts: 17,819

Re: Deleting variables when all values are zero

@lunarin Search the forum. There are several solutions on here that work. Are you interested in staying in IML?

Super User
Posts: 9,681

Re: Deleting variables when all values are zero

data have;
input x $ y z;
cards;
x 1 0
y 0 0
z 0 0
;
run;
data _null_;
 set sashelp.vcolumn(where=(libname='WORK' and memname='HAVE' and type='num')) end=last;
 if _n_=1 then call execute('proc sql;create table temp as select ');
 call execute(cat('sum(',name,' ne 0) as ',name));
 if last then call execute('from have;quit;');
  else call execute(',');
run;
proc transpose data=temp out=temp1;
run;
proc sql noprint;
select _name_ into : list separated by ' ' from temp1
 where col1=0;
quit;
data want;
 set have(drop=&list);
run;
Super User
Posts: 9,681

Re: Deleting variables when all values are zero

data have;
input x $ y z zz;
cards;
x 1 0 0
y 0 0 0
z 0 0 0
;
run;
proc iml;
use have;
read all var _num_ into x[c=vnames];
close;

y=vnames[loc((x^=0)[+,]=0)];

submit y;
 data want;
  set have;
  drop &y;
 run;
endsubmit;
quit;
Super User
Posts: 5,082

Re: Deleting variables when all values are zero

When you select which fields to remove, note that a sum of 0 is not a guarantee that all values are zero.  You would be better advised to use PROC SUMMARY to get the min and the max, and make sure both the min and the max are zero.

Ask a Question
Discussion stats
  • 11 replies
  • 469 views
  • 0 likes
  • 5 in conversation