DATA Step, Macro, Functions and more

How to handle many variables in IF and KEEP/DROP statements

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 11
Accepted Solution

How to handle many variables in IF and KEEP/DROP statements

Hi everyone!

I have a SAS dataset with more than 1000 variables distributed like this:

ID      Auto      Auto_2     Auto_3     City     City_2     Sport     Sport_2     Sport_3     Sport_4 ... and so on (many different variables repeated different times).

I need to:

1) keep only those rows that have missing values in all the variables ending with "_number", (or, equally, drop those rows that have values in any of the variables ending with "_number").

For example:?

   ID     Auto   Auto_2    Auto_3    City         City_2    

    01     Fiat    Ford                      Milan                                ---> Drop row

    02     Fiat                                 Milan        Rome               ---> Drop row

    03     Fiat    Ford          Ford      Milan        Rome               ---> Drop row

    04     Fiat                                 Milan                                ---> KEEP ROW

If it can help, if Variable_n is missing, then Variable_(n+1) is missing too.

2) keep only the columns without the "_number".

In the previous example only  ID Auto City must be kept.

The problem in both cases is the same: I can't write all the variables in a drop/keep statement or in a IF condition because they're too many.

Is there a smart way to manage variables like these? Maybe something like lists...? I don't know.

Thanks a lot

Irene


Accepted Solutions
Solution
‎07-03-2015 11:30 PM
Super User
Posts: 9,691

Re: How to handle many variables in IF and KEEP/DROP statements

Code: Program

data have;
input (ID   Auto   Auto_2   Auto_3   City   City_2  ) ($); 
cards;
  01 Fiat Ford . Milan . 
  02 Fiat . . Milan Rome 
  03 Fiat Ford Ford Milan Rome 
  04 Fiat . . Milan . 
;
run;
proc transpose data=have(obs=0) out=temp;
var _all_;
run;  

proc sql;
select _name_ into : list separated by ',' from temp where prxmatch('/_\d+$/',strip(_name_));
select count(*) into : n from temp where prxmatch('/_\d+$/',strip(_name_));
quit;

data want;
set have;
if cmiss(&list)=&n;
run;

View solution in original post


All Replies
Super User
Posts: 17,963

Re: How to handle many variables in IF and KEEP/DROP statements

You can only use short cut references for variable lists when you have prefixes Smiley Sad

Otherwise you can create a macro variable with the list of variables by querying the SASHELP.VCOLUMN table.

Occasional Contributor
Posts: 11

Re: How to handle many variables in IF and KEEP/DROP statements

I've never used the SASHELP.VCOLUMN table, I'll have a look!

Saying shortcut references do you mean, for example "Var:" to use all variables starting with "Var"? I've aIready tried this method in a DROP statement, but I think I can't use it in a IF statement.. Am I wrong?

Any ideas for the IF statement?

Thanks

Irene

Super User
Super User
Posts: 7,430

Re: How to handle many variables in IF and KEEP/DROP statements

Well, you could put the _numbers in an array, then loop over the array:

data want;

     set have;

     array elem{*} auto-city2;

     do i=2 to dim(elem{*});

          if elem{i}="" and elem{i-1}="" then flag="Y";

     end;

run;

Then drop based on flag (note untested code!).

Personally, in these instances you may be better off normalising your data, i.e. have on row per data element:

ID          SECTION          NUMBER          VALUE

1            AUTO                1                          Fiat

1            AUTO                2                          Ford

...

Super User
Posts: 5,099

Re: How to handle many variables in IF and KEEP/DROP statements

Note that you have a special condition in your data:  If Variable_N is missing, then Variable_N+1 is also missing.  This means that you don't have to process all your variables.  You could process only the "_2" variables and ignore the rest.

Here's a beginning to a solution.  See if you need more than that.

proc contents data=have noprint out=_contents_ (keep=name);

run;

*** Just so you know what you are working with here ...;

proc print data=_contents_ (obs=10);

run;

data keeplist _2_list;

  set _contents_;

  last_part = scan(name, -1, '_');

  if last_part='2' then output _2_list;

  else if input(last_part, ??5.) < 2 then output keeplist;

run;

*** Again, just so you know what you are working with here ...;

proc print data=keeplist (obs=10);

proc print data=_2_list (obs=10);

run;

proc sql;

  select trim(name) into : keeplist separated by ' ' from keeplist;

  select trim(name) into : _2_list separated by ' ' from _2_list;

quit;

%put &keeplist;

%put &_2_list;

The rest will be (by comparison) easier, but see if this much makes sense.

Good luck.

Solution
‎07-03-2015 11:30 PM
Super User
Posts: 9,691

Re: How to handle many variables in IF and KEEP/DROP statements

Code: Program

data have;
input (ID   Auto   Auto_2   Auto_3   City   City_2  ) ($); 
cards;
  01 Fiat Ford . Milan . 
  02 Fiat . . Milan Rome 
  03 Fiat Ford Ford Milan Rome 
  04 Fiat . . Milan . 
;
run;
proc transpose data=have(obs=0) out=temp;
var _all_;
run;  

proc sql;
select _name_ into : list separated by ',' from temp where prxmatch('/_\d+$/',strip(_name_));
select count(*) into : n from temp where prxmatch('/_\d+$/',strip(_name_));
quit;

data want;
set have;
if cmiss(&list)=&n;
run;
Super User
Posts: 9,691

Re: How to handle many variables in IF and KEEP/DROP statements

Dictionary.columns would cost some time to query these information.

If you just want get variable's name ,proc transpose is good and fast.

For your second Q.

Code: Program

data have;
input (ID   Auto   Auto_2   Auto_3   City   City_2  ) ($); 
cards;
  01 Fiat Ford . Milan . 
  02 Fiat . . Milan Rome 
  03 Fiat Ford Ford Milan Rome 
  04 Fiat . . Milan . 
;
run;
proc transpose data=have(obs=0) out=temp;
var _all_;
run;  

proc sql;
select _name_ into : list separated by ' ' from temp where not prxmatch('/_\d+$/',strip(_name_));
quit;

data want;
set have;
keep &list;
run;
Occasional Contributor
Posts: 11

Re: How to handle many variables in IF and KEEP/DROP statements

Thank you so much!

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 7 replies
  • 395 views
  • 8 likes
  • 5 in conversation