I have about 50 datasets in my directory. This varies by study, so I need to apply the change I am asking to all the datasets in the directory.
I want to delete columns that contain a string with "DTC" in it. The number of columns that contain this string could also vary. It's not consistent. There could be text before and after it, but what I want is all columns that contain "DTC" to be deleted.
So, it is a two step process:
1. In each dataset, extract all "DTC" containing columns. (I know I can use index or prxmatch)
2. Delete it from all datasets in the directory (this holds on to the previous list of drop variables - so unable to achieve this).
How do I go about this? Please let me know the best way.
Thank you!
Here an approach using call execute().
/* create sample data */
data
ds1(drop=varDTC01 DTC _dTc_xx)
ds2(keep=name sex aaDtXc_06)
ds3
;
length varDTC01 DTC _dTc_xx aaDtXc_06 abc_dtc abcDtcX $1;
if 0 then call missing(of _all_);
set sashelp.class;
run;
/* define parameters */
%let libref=work;
%let col_pattern=DTC;
/*
use call execute to drop a list of columns dynamically populated into SAS variable DropList
*/
data _null_;
set sashelp.vcolumn(keep=libname memname name where=(libname=%upcase("&libref") and find(name,"&col_pattern",'i')>0));
by libname memname;
length DropList $32767;
retain DropList;
DropList=catx(',',DropList,name);
if last.memname then
do;
call execute(catx(' ', 'proc sql; alter table', cats(libname,'.',memname),'drop', DropList, ';quit;'));
call missing(DropList);
end;
run;
Fixed as per @FreelanceReinh's comment.
It is not clear what you are trying to do:
Are you asking to drop variables whose NAME contain the string 'DTC'?
Or set the value of variables to missing where the variable value contains string 'DTC'?
Or drop variables where at least one value contains the string 'DTC'?
SASHELP.VCOLUMN has the list of data sets in your library and all the variable names and types.
You can use that to pull the table names in automatically.
It's not clear if you're dropping columns or rows.
If rows, you likely want to use FIND() to find the substr
_CHARACTER_ will refer to all character variables.
If it's columns you can filter than via a SQL query and then use PROC DATASETS to drop the columns - that's the most efficient way IMO.
Get it working for one data set first and then you can generalize.
Thanks Reeza. I tried a similar approach. The problem happening is that when I use it in a macro loop to perform the operation for all datasets in the directory, the "&DROPME" list holds on to the previous dataset's variables that were executed.
proc sql noprint;
SELECT name INTO :dropMe SEPARATED BY ' '
FROM sashelp.vcolumn
WHERE libname = 'TEMP' AND
memname = 'TOX' AND
INDEX(NAME,'DTC')>0;
quit;
data test;
set temp.tox;
drop &dropme;
run;
Thanks Tom and Reeza.
This is the approach I am using in the macro.
How would I do the same with proc datasets.
%macro drop (&dn);
proc sql noprint;
SELECT name INTO :dropMe SEPARATED BY ' '
FROM sashelp.vcolumn
WHERE libname = 'TEMP' AND
memname = upcase('&dn') AND
INDEX(NAME,'DTC')>0;
%let numfiles = &sqlObs;
quit;
%mend drop;
proc contents data=temp._all_ noprint out=contents;
run;
data _null_;
set contents;
by memname ;
if first.memname;
call execute(cats('%drop(',memname,')'));
run;
Thanks,
Nothing in that code is dropping any columns from any datasets.
First thing to do is make sure you have a copy of the original datasets before you start dropping columns.
Second thing to do is make sure the first thing actually worked.
To solve this problem I would just take the output of PROC CONTENTS and use it to generate the PROC DATASETS code.
I like to generate code to a file so that I can open the file and look at it and test it before making the final version that actually runs it.
Much easier to debug that either macros or CALL EXECUTE steps.
So use something like this:
proc contents data=temp._all_ noprint out=contents;
run;
filename code temp;
data _null_;
set contents;
by memname ;
where index(upcase(name),'DTC');
file code;
if first.memname then put 'modify ' memname ';' ;
put 'drop ' name ';';
if last.memname then put 'run;';
run;
proc dataset nolist lib=temp;
%include code / source2;
quit;
I tried it the way you suggested, but I get this error
The DROP and KEEP statements are not supported in procedure steps in this release of the
SAS System. Therefore, these statements are ignored.
Is this the actual code, or does it just resemble the code? Issues that stand out:
%if &sqlobs > 0 %then %do;
DATA step that drops variables goes here
%end;
That was only an example of what I am trying to achieve and not the actual macro.
I tried this way, but it still holds on to the pre-resolved macro.
%macro drop (&dn);
proc sql noprint;
SELECT name INTO :dropMe SEPARATED BY ' '
FROM sashelp.vcolumn
WHERE libname = 'TEMP' AND
memname = upcase("&dn") AND
INDEX(upcase(NAME,'DTC'))>0;
quit;
%if &sqlobs > 0 %then %do;
data test;
set temp.&dn;
drop &dropme;
run;
%end;
%mend drop;
proc contents data=temp._all_ noprint out=contents;
run;
data _null_;
set contents;
by memname ;
if first.memname;
call execute(cats('%drop(',memname,')'));
run;
OK, let's make an earlier modification. Now, you are using:
proc contents data=temp._all_ noprint out=contents;
run;
data _null_;
set contents;
by memname ;
if first.memname;
call execute(cats('%drop(',memname,')'));
run;
Let's subset earlier, so the "drop" logic encounters data sets that must contain at least one DTC variable:
proc contents data=temp._all_ noprint out=contents;
run;
data _null_;
set contents;
by memname ;
if first.memname;
where index(upcase(name), 'DTC') > 0;
call execute(cats('%drop(',memname,')'));
run;
That can only help, even after the issues are diagnosed and resolved. And it can clarify what the issues are.
@Astounding It's still holding to the earlier datasets DTC variables. The DTC variable name changes with each dataset.
%macro drop (dn);
proc sql noprint;
SELECT name INTO :dropMe SEPARATED BY ' '
FROM sashelp.vcolumn
WHERE libname = 'TEMP' AND
memname = upcase("&dn") AND
INDEX(upcase(NAME),'DTC')>0;
quit;
%if &sqlobs > 0 %then %do;
data test;
set temp.&dn;
drop &dropme;
run;
%end;
%mend drop;
proc contents data=temp._all_ noprint out=contents;
run;
data _null_;
set contents;
by memname ;
if first.memname;
where index(upcase(name),'DTC') > 0;
call execute(cats('%drop(',memname,')'));
run;
OK, time to add some temporary diagnostics, then. Inside the macro definition:
%let dropme=;
This should be the first statement within the macro.
%put &sqlobs;
%put &dropme;
This goes right after the SQL code, before checking %if &sqlobs > 0.
Also, my recollection is that MEMNAME is all caps within sashelp.vcolumn. But if I'm wrong about that, the comparison should be:
upcase(memname) = upcase("&dn") AND
If this doesn't do it, sorry to say I'm done for the evening. I'm heading out the door. Hope this gives you enough to go on.
The macro works fine if I try it on individual datasets.
It gives me exactly what I need when I do a manual run on datasets
%drop (bma); >> works great
%drop (tox); >> works great
It's only when I try the data _null_ and apply at once to all datasets, it breaks. It says no drop variables found.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.