BookmarkSubscribeRSS Feed
lunarin
Calcite | Level 5

Hi everyone,

 

I am new to programming in SAS. As part of a school project, i was tasked to do data cleaning for the following files: https://www.kaggle.com/c/santander-customer-satisfaction/data?

 

The data has many variables that have no meaning (i.e. consists of only 0 values). I am wondering how can I write a code to remove these variables automatically? Do note I am completely new to SAS programming and I have tried to modify other solutions to fit my problem but to no avail. Thank you in advance.

11 REPLIES 11
PeterClemmensen
Tourmaline | Level 20

Hi 🙂

 

I Can't open any of the data you are linking to, can you post some sample data?

 

The variables you would like to check for a 0 value are numeric right?

 

Regards

lunarin
Calcite | Level 5

Hi,

 

http://imgur.com/a/AtRzX

 

It has over 100 columns with no serialized naming. Yes they are numeric. I am just not sure how to loop through each variable in SAS!

 

Thanks for your help

PeterClemmensen
Tourmaline | Level 20

OK so they are all numeric? no character variables?

lunarin
Calcite | Level 5
Yes that is right
PeterClemmensen
Tourmaline | Level 20

Try something like this

 

proc iml;
   use THENAMEOFYOURDATASET;
      read all var _ALL_ into X[colname=varNames]; 
   close THENAMEOFYOURDATASET;
   
   cols = ncol(X);

   sums = j(1,cols,0);

   do j = 1 to cols;
      sums[j] = sum(X[ ,j]);
   end;

   idx = loc(sums > 0); 

   newX = X[ ,idx];
   
   create MyData from newX[colname=varNames];
      append from newX;
   close MyData;

QUIT;

Where of course you insert your dataset name instead of  THENAMEOFYOURDATASET.. 🙂

 http://sasnrd.com/

lunarin
Calcite | Level 5

hi draycut,

 

thank you very much it worked!

 

lunarin
Calcite | Level 5

Hi draycut,

 

Upon closer inspection, the data was actually modified and it removed variables who's sum is not zero.

What I want to achieve is to delete an ENTIRE variable IF all its values are zero. I am wondering if you could me with that?

 

 

Reeza
Super User

@lunarin Search the forum. There are several solutions on here that work. Are you interested in staying in IML?

Ksharp
Super User
data have;
input x $ y z;
cards;
x 1 0
y 0 0
z 0 0
;
run;
data _null_;
 set sashelp.vcolumn(where=(libname='WORK' and memname='HAVE' and type='num')) end=last;
 if _n_=1 then call execute('proc sql;create table temp as select ');
 call execute(cat('sum(',name,' ne 0) as ',name));
 if last then call execute('from have;quit;');
  else call execute(',');
run;
proc transpose data=temp out=temp1;
run;
proc sql noprint;
select _name_ into : list separated by ' ' from temp1
 where col1=0;
quit;
data want;
 set have(drop=&list);
run;
Ksharp
Super User
data have;
input x $ y z zz;
cards;
x 1 0 0
y 0 0 0
z 0 0 0
;
run;
proc iml;
use have;
read all var _num_ into x[c=vnames];
close;

y=vnames[loc((x^=0)[+,]=0)];

submit y;
 data want;
  set have;
  drop &y;
 run;
endsubmit;
quit;
Astounding
PROC Star

When you select which fields to remove, note that a sum of 0 is not a guarantee that all values are zero.  You would be better advised to use PROC SUMMARY to get the min and the max, and make sure both the min and the max are zero.

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 11 replies
  • 4072 views
  • 0 likes
  • 5 in conversation