BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
saslove
Quartz | Level 8

I have about 50 datasets in my directory. This varies by study, so I need to apply the change I am asking to all the datasets in the directory.

 

I want to delete columns that contain a string with "DTC" in it. The number of columns that contain this string could also vary. It's not consistent. There could be text before and after it, but what I want is all columns that contain "DTC" to be deleted.

 

So, it is a two step process:

1. In each dataset, extract all "DTC" containing columns. (I know I can use index or prxmatch)

2. Delete it from all datasets in the directory (this holds on to the previous list of drop variables - so unable to achieve this).

 

How do I go about this? Please let me know the best way.

Thank you! 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Patrick
Opal | Level 21

@saslove 

Here an approach using call execute().

/* create sample data */
data 
  ds1(drop=varDTC01 DTC _dTc_xx)
  ds2(keep=name sex aaDtXc_06)
  ds3
  ;
  length varDTC01 DTC _dTc_xx aaDtXc_06 abc_dtc abcDtcX $1;
  if 0 then call missing(of _all_);
  set sashelp.class;
run;

/* define parameters */
%let libref=work;
%let col_pattern=DTC;

/* 
  use call execute to drop a list of columns dynamically populated into SAS variable DropList
*/
data _null_;
  set sashelp.vcolumn(keep=libname memname name where=(libname=%upcase("&libref") and find(name,"&col_pattern",'i')>0));
  by libname memname;
  length DropList $32767;
  retain DropList;
  DropList=catx(',',DropList,name);
  if last.memname then
    do;
      call execute(catx(' ', 'proc sql; alter table', cats(libname,'.',memname),'drop', DropList, ';quit;'));
      call missing(DropList);
    end;
run;

 

Fixed as per @FreelanceReinh's comment.

View solution in original post

21 REPLIES 21
Tom
Super User Tom
Super User

It is not clear what you are trying to do:

 

Are you asking to drop variables whose NAME contain the string 'DTC'?

Or set the value of variables to missing where the variable value contains string 'DTC'?

Or drop variables where at least one value contains the string  'DTC'?

Reeza
Super User

SASHELP.VCOLUMN has the list of data sets in your library and all the variable names and types. 

You can use that to pull the table names in automatically. 

 

It's not clear if you're dropping columns or rows. 

 

If rows, you likely want to use FIND() to find the substr

_CHARACTER_ will refer to all character variables. 

 

If it's columns you can filter than via a SQL query and then use PROC DATASETS to drop the columns - that's the most efficient way IMO.

 

Get it working for one data set first and then you can generalize.

saslove
Quartz | Level 8

Thanks Reeza. I tried a similar approach. The problem happening is that when I use it in a macro loop to perform the operation for all datasets in the directory, the "&DROPME" list holds on to the previous dataset's variables that were executed. 

 

proc sql noprint;
SELECT name INTO :dropMe SEPARATED BY ' '
FROM sashelp.vcolumn
WHERE libname = 'TEMP' AND
memname = 'TOX' AND
INDEX(NAME,'DTC')>0;
quit;

data test;
set temp.tox;
drop &dropme;
run;

Tom
Super User Tom
Super User
When the SQL query does not produce any observations the target of the INTO clause is not modified. Set if empty BEFORE the query. Or check the automatic macro variable SQLOBS to see if any were found.
Reeza
Super User
1. set Dropme to missing before theSQL query - if you don't reset it, it keeps historical values. Good practice anytime.

2. You may not have any variables with DTC - you may also need to account for case, right now a variable named random_dtc would not be found.

3. You can use %IF/%THEN to conditionally execute the drop. I would advise using PROC DATASETS its faster and you'll be doing this over several datasets so it's worth the time.
saslove
Quartz | Level 8

Thanks Tom and Reeza. 

 

This is the approach I am using in the macro.

 

How would I do the same with proc datasets.

 

%macro drop (&dn);
proc sql noprint;
SELECT name INTO :dropMe SEPARATED BY ' '
FROM sashelp.vcolumn
WHERE libname = 'TEMP' AND
memname = upcase('&dn') AND
INDEX(NAME,'DTC')>0;
%let numfiles = &sqlObs;
quit;

%mend drop;

proc contents data=temp._all_ noprint out=contents;
run;

 

data _null_;
set contents;
by memname ;
if first.memname;
call execute(cats('%drop(',memname,')'));
run;

 

Thanks,

Tom
Super User Tom
Super User

Nothing in that code is dropping any columns from any datasets.

 

First thing to do is make sure you have a copy of the original datasets before you start dropping columns.

Second thing to do is make sure the first thing actually worked.

 

To solve this problem I would just take the output of PROC CONTENTS and use it to generate the PROC DATASETS code.

 

I like to generate code to a file so that I can open the file and look at it and test it before making the final version that actually runs it.

Much easier to debug that either macros or CALL EXECUTE steps.

 

So use something like this:

 

proc contents data=temp._all_ noprint out=contents;
run;
 
filename code temp;
data _null_;
  set contents;
  by memname ;
  where index(upcase(name),'DTC');
  file code;
  if first.memname then put 'modify ' memname ';' ;
  put 'drop ' name ';';
  if last.memname then put 'run;';
run;

proc dataset nolist lib=temp;
%include code / source2;
quit;

 

 

saslove
Quartz | Level 8

I tried it the way you suggested, but I get this error

 

The DROP and KEEP statements are not supported in procedure steps in this release of the
SAS System. Therefore, these statements are ignored.

Astounding
PROC Star

Is this the actual code, or does it just resemble the code?  Issues that stand out:

 

  • You reference a macro variable in single quotes:  '&dn'   That can't work.  You need double quotes:  "&dn"
  • Variable names in sashelp.vcolumn are not necessarily in upper case.  Switch to:  index(upcase(name), 'DTC') > 0
  • Your final macro is incomplete.  Once the corrections are made, it would find a list of variable names, but wouldn't drop anything.  At the end of the macro (after the final QUIT statement, you need to add something like:

%if &sqlobs > 0 %then %do;

   DATA step that drops variables goes here

%end;

saslove
Quartz | Level 8

That was only an example of what I am trying to achieve and not the actual macro. 

 

I tried this way, but it still holds on to the pre-resolved macro.

 

 

%macro drop (&dn);
proc sql noprint;
SELECT name INTO :dropMe SEPARATED BY ' '
FROM sashelp.vcolumn
WHERE libname = 'TEMP' AND
memname = upcase("&dn") AND
INDEX(upcase(NAME,'DTC'))>0;
quit;

%if &sqlobs > 0 %then %do;
data test;
set temp.&dn;
drop &dropme;
run;
%end;

%mend drop;

proc contents data=temp._all_ noprint out=contents;
run;

 

data _null_;
set contents;
by memname ;
if first.memname;
call execute(cats('%drop(',memname,')'));
run;

Astounding
PROC Star

OK, let's make an earlier modification.  Now, you are using:

 

proc contents data=temp._all_ noprint out=contents;
run;

 

data _null_;
set contents;
by memname ;
if first.memname;
call execute(cats('%drop(',memname,')'));
run;

Let's subset earlier, so the "drop" logic encounters data sets that must contain at least one DTC variable:

 

proc contents data=temp._all_ noprint out=contents;
run;

 

data _null_;
set contents;
by memname ;
if first.memname;
where index(upcase(name), 'DTC') > 0;
call execute(cats('%drop(',memname,')'));
run;

That can only help, even after the issues are diagnosed and resolved.  And it can clarify what the issues are.

saslove
Quartz | Level 8

@Astounding  It's still holding to the earlier datasets DTC variables. The DTC variable name changes with each dataset. 

 


%macro drop (dn);
proc sql noprint;
SELECT name INTO :dropMe SEPARATED BY ' '
FROM sashelp.vcolumn
WHERE libname = 'TEMP' AND
memname = upcase("&dn") AND
INDEX(upcase(NAME),'DTC')>0;
quit;

%if &sqlobs > 0 %then %do;
data test;
set temp.&dn;
drop &dropme;
run;
%end;

%mend drop;

proc contents data=temp._all_ noprint out=contents;
run;

data _null_;
set contents;
by memname ;
if first.memname;
where index(upcase(name),'DTC') > 0;
call execute(cats('%drop(',memname,')'));
run;

Astounding
PROC Star

OK, time to add some temporary diagnostics, then.  Inside the macro definition:

 

%let dropme=;

This should be the first statement within the macro.

 

%put &sqlobs;
%put &dropme;

This goes right after the SQL code, before checking %if &sqlobs > 0.

 

Also, my recollection is that MEMNAME is all caps within sashelp.vcolumn.  But if I'm wrong about that, the comparison should be:

upcase(memname) = upcase("&dn") AND

If this doesn't do it, sorry to say I'm done for the evening.  I'm heading out the door.  Hope this gives you enough to go on.

saslove
Quartz | Level 8

The macro works fine if I try it on individual datasets.

It gives me exactly what I need when I do a manual run on datasets

%drop (bma); >> works great 

%drop (tox); >> works great

 

It's only when I try the data _null_ and apply at once to all datasets, it breaks. It says no drop variables found. 

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 21 replies
  • 2234 views
  • 3 likes
  • 6 in conversation