DATA Step, Macro, Functions and more

Process to remove variables from another dataset

Accepted Solution Solved
Reply
Contributor
Posts: 66
Accepted Solution

Process to remove variables from another dataset

Hi,

I hope someone could help me with this problem:

I have a data file a.data with accounts and their attributes across ~100 variables

I have a list of variables – as a separate file – a.vars,which contains a subset of 100 variables (let’s say 20 of the 100)

I need to code a process that would drop from a.data all variables that are not in a.vars file. It has to be SAS process (not manualwork through EXCEL), that just does it in a completely automated manner.

a.data:

id var1 var2 var3 var4 var5

1 0 1 1 0 1 0

..

a.vars:

var1

var2

var3

var6

var7

new.data

id var1 var2 var3

1 0 1 1

Thanks for your help.


Accepted Solutions
Solution
‎03-13-2012 06:04 PM
Respected Advisor
Posts: 4,646

Re: Process to remove variables from another dataset

Note : Do not use the period character in SAS dataset names. That's just asking for trouble.

data a_data;
input id var1 var2 var3 var4 var5 var6 var7;
datalines;
1 0 1 1 0 1 0 1 1
;

data a_vars;
input var $;
datalines;
var1
var2
var3
var6
var7
;

proc sql noprint;
select var into :varList separated by " " from a_vars;

data want(keep=&varList); set a_data; run;

proc print; run;

PG

PG

View solution in original post


All Replies
Super User
Posts: 17,828

Re: Process to remove variables from another dataset

Something like the following should work (untested).

You could convert it into a macro if required based on the libname, dataset name and output name easily.

proc sql;

*get columns in your dataset;

     create table vnames as

select name from dictionary.columns

where libname=your_libname and memname=yourdataset;

*Get overlap;

select name into :names separated by " " from vnames

where name in (select names from data name_list);

quit;

data new_data;

set <your_data>;

keep &names.;

run;

*Corrected as noted by ballardw Smiley Happy

Super User
Posts: 10,500

Process to remove variables from another dataset

Reeza forgot to include the input dataset.

Data new_data;

     set <yourdataset>;

     keep &names;

run;

Contributor
Posts: 66

Process to remove variables from another dataset

thank you for your replies.

proc sql;

*get columns in your dataset;

     create table vnames as

select name from dictionary.columns

where libname=your_libname and memname=yourdataset;

do i have to select the variable names in this statement?

Super User
Posts: 17,828

Process to remove variables from another dataset

No, it grabs the names of the variables in your dataset.

You only need to specify the library and dataset name in the where clause.

Then the dataset to pull the the names you want to keep in the next query.

Solution
‎03-13-2012 06:04 PM
Respected Advisor
Posts: 4,646

Re: Process to remove variables from another dataset

Note : Do not use the period character in SAS dataset names. That's just asking for trouble.

data a_data;
input id var1 var2 var3 var4 var5 var6 var7;
datalines;
1 0 1 1 0 1 0 1 1
;

data a_vars;
input var $;
datalines;
var1
var2
var3
var6
var7
;

proc sql noprint;
select var into :varList separated by " " from a_vars;

data want(keep=&varList); set a_data; run;

proc print; run;

PG

PG
Super User
Posts: 17,828

Re: Process to remove variables from another dataset

You'd get at least a warning message in your log with this method, as not all variables might be in the dataset.

Trusted Advisor
Posts: 1,300

Process to remove variables from another dataset

data foo;

array var[10];

do id=1 to 100;

  do _n_=1 to dim(var);

   var[_n_]=rand('tabled',.5);

  end;

  output;

end;

run;

data bar;

input var $;

cards;

var1

var3

var5

var7

var9

;

run;

proc sql noprint;

select b.var into :drops separated by ','

   from bar b,

        dictionary.columns a

  where b.var=a.name;

alter table foo drop column &drops;

quit;

Super User
Posts: 5,082

Process to remove variables from another dataset

If you are sure that every variable named in A.VARS actually appears in  A.DATA, here is the method I like.  This assumes that VARNAME is the name of the one and only variable in A.VARS:

data _null_;

   call execute('data new.data; set a.data (keep=');

   do until (done);

         set a.vars end=done;

         call execute(varname);

   end;

   call execute('); run;');

   stop;

run;

The log will show you the statements that get generated.  Also, note that ID will be dropped if it is not named in A.VARS.

Good luck.

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 8 replies
  • 214 views
  • 3 likes
  • 6 in conversation