I'm having trouble doing this, please help.
I have about 20 columns (in no particular sequence, eg. COL1=MGF3654, COL2=NTD8475, etc.) and there are about 30 number of rows..with numeric data in the cells. I want to check which columns total to 1 so I can delete thos clumns, they are of no need). Thanks
Have:
COL1 COL2 ... COL20
2001.01 5 0 23
2001.02 89 0 4
.
2006.12 12 0 8
if COL total to 1 then I want to delete it.
Want:
COL1 COL3..... COL20
2001.01 5 45 23
2001.02 89 2 4
.
2006.12 12 6 8
data foo;
input dte $ mgf3654 emtcol1 ntd8475 col3;
cards;
2001.01 5 0 23 45
2001.02 89 0 4 2
2006.12 12 0 8 6
;
run;
data _null_;
length names nsums $32000;
retain names nsums;
set sashelp.vcolumn(where=(libname='WORK' and memname='FOO' and type='num')) end=eof;
names=catx(' ',of names name);
nsum=strip(name) || '_sum';
nsums=catx(' ',of nsums nsum);
if eof then
do;
call symputx('names',names);
call symputx('nsums',nsums);
call symputx('vars',put(_n_,best.));
end;
run;
%put &vars &names &nsums;
data _null_;
set foo end=eof;
array v
array s
do i=1 to &vars;
s+v;
end;
length drops $32000;
retain drops;
if eof then
do;
do i=1 to &vars;
if s=0 then
do;
drop=scan("&names",i,' ');
drops=catx(' ',of drops drop);
end;
end;
call symputx('drops',drops);
end;
run;
proc sql noprint;
alter table foo
drop column &drops;
quit;
Thanks FriedEgg, to be honest it's a little confusing and it didn't quite work.. maybe it's something 'm doing. But basically looking to get rid of all columns that do't have any positive numbers in them.. thanks
You will have to be clearer regarding what you have and what you want. Don't abbreviate the have in your example. Rather, show an example with ONLY three variables and a WANT with those same variables. YOU must make it clear as to what you have and what you want.
I will explain so maybe you can work out the issue you have.
First I create some data to work with.
data foo;
input dte $ mgf3654 emtcol1 ntd8475 col3;
cards;
2001.01 5 0 23 45
2001.02 89 0 4 2
2006.12 12 0 8 6
;
run;
I have a character field (dte) and four numeric variables without a standard naming convention (mgf3654 emtcol1 ntd8475 col3).
The first thing I want to do is collect the metadata I want to work with.
data _null_;
length names nsums $32000;
retain names nsums;
set sashelp.vcolumn(where=(libname='WORK' and memname='FOO' and type='num')) end=eof;
names=catx(' ',of names name);
nsum=strip(name) || '_sum';
nsums=catx(' ',of nsums nsum);
if eof then
do;
call symputx('names',names);
call symputx('nsums',nsums);
call symputx('vars',put(_n_,best.));
end;
run;
I will collect, dynamically the names of my numeric variables.
I will create a second copy of these names with '_sum' added to them for a calculation step next.
I will collect the number of variables I'm working with into macro variable 'vars'
Now I will run a sum for all columns. If the columns sum is 0 then I will want to drop that variable.
data _null_;
set foo end=eof;
array v
array s
do i=1 to &vars;
s+v;
end;
length drops $32000;
retain drops;
if eof then
do;
do i=1 to &vars;
if s=0 then
do;
drop=scan("&names",i,' ');
drops=catx(' ',of drops drop);
end;
end;
call symputx('drops',drops);
end;
run;
Create arrays using the variable lists I collected in the previous step
The first loop will calculate the sums for each column
When I reach the last record in the file I will loop through each variable again. If I have calculated this columns sum to be 0 I will collect the name of the variable into a macro variable 'drops' to use later.
Now that I have found all the variables I want to drop I will perform the action.
proc sql noprint;
alter table foo
drop column &drops;
quit;
Why would you want to DELETE the variables? Perhaps you mean that you want to exclude those variables from later analysis?
What are you ultimately doing with the data?
To sum a "column" use PROC SUMMARY.
proc summary data=have;
var _numeric_;
output out=stats(drop=_type_ _freq_) sum=;
run;
Then use PROC TRANSPOSE to rotate so that each row represents a variable.
proc transpose data=stats out=vars (rename=(col1=sum)) ;
run;
Now you can use PROC SQL to create macro variables with either the list of variables to keep or those to drop.
proc sql noprint ;
%let keep=;
select _name_ into :keep separated by ' '
from vars
where sum ne 1
;
%let drop=;
select _name_ into :drop separated by ' '
from vars
where sum = 1
;
quit;
Now you can write code to use the original dataset and select which group of variables you want.
proc print data=have ;
var &keep ;
run;
data want ;
set have;
drop &drop;
run;
Brilliant.. Thanks guys.. I really appreciate it..
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.