Hi everyone,
I am new to programming in SAS. As part of a school project, i was tasked to do data cleaning for the following files: https://www.kaggle.com/c/santander-customer-satisfaction/data?
The data has many variables that have no meaning (i.e. consists of only 0 values). I am wondering how can I write a code to remove these variables automatically? Do note I am completely new to SAS programming and I have tried to modify other solutions to fit my problem but to no avail. Thank you in advance.
Hi 🙂
I Can't open any of the data you are linking to, can you post some sample data?
The variables you would like to check for a 0 value are numeric right?
Regards
Hi,
It has over 100 columns with no serialized naming. Yes they are numeric. I am just not sure how to loop through each variable in SAS!
Thanks for your help
OK so they are all numeric? no character variables?
Try something like this
proc iml;
use THENAMEOFYOURDATASET;
read all var _ALL_ into X[colname=varNames];
close THENAMEOFYOURDATASET;
cols = ncol(X);
sums = j(1,cols,0);
do j = 1 to cols;
sums[j] = sum(X[ ,j]);
end;
idx = loc(sums > 0);
newX = X[ ,idx];
create MyData from newX[colname=varNames];
append from newX;
close MyData;
QUIT;
Where of course you insert your dataset name instead of THENAMEOFYOURDATASET.. 🙂
hi draycut,
thank you very much it worked!
Hi draycut,
Upon closer inspection, the data was actually modified and it removed variables who's sum is not zero.
What I want to achieve is to delete an ENTIRE variable IF all its values are zero. I am wondering if you could me with that?
@lunarin Search the forum. There are several solutions on here that work. Are you interested in staying in IML?
data have;
input x $ y z;
cards;
x 1 0
y 0 0
z 0 0
;
run;
data _null_;
set sashelp.vcolumn(where=(libname='WORK' and memname='HAVE' and type='num')) end=last;
if _n_=1 then call execute('proc sql;create table temp as select ');
call execute(cat('sum(',name,' ne 0) as ',name));
if last then call execute('from have;quit;');
else call execute(',');
run;
proc transpose data=temp out=temp1;
run;
proc sql noprint;
select _name_ into : list separated by ' ' from temp1
where col1=0;
quit;
data want;
set have(drop=&list);
run;
data have;
input x $ y z zz;
cards;
x 1 0 0
y 0 0 0
z 0 0 0
;
run;
proc iml;
use have;
read all var _num_ into x[c=vnames];
close;
y=vnames[loc((x^=0)[+,]=0)];
submit y;
data want;
set have;
drop &y;
run;
endsubmit;
quit;
When you select which fields to remove, note that a sum of 0 is not a guarantee that all values are zero. You would be better advised to use PROC SUMMARY to get the min and the max, and make sure both the min and the max are zero.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.