Desktop productivity for business analysts and programmers

Removing variables in a table with a missing value, but only for certain observations

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 15
Accepted Solution

Removing variables in a table with a missing value, but only for certain observations

Hello,

 

I have a table with 387 variables and 25 rows. I want to remove all variables that have missing values, but only if the missing values is between the 6th and the 25th rows (some variables have lags and so the first 5 rows may have missing values with no problem). I am guessing this is really easy with proc sql, but I can't figure out the code for it to work. For example, I want to remove "C_CNBA11M_LAG1" to "C_CNBA11M_LAG4" in this example.

 

Thank you very much!


example.jpg

Accepted Solutions
Solution
‎07-27-2017 03:30 PM
Super User
Posts: 5,509

Re: Removing variables in a table with a missing value, but only for certain observations

Unfortunately, since you may or may not need to run a step to remove variables, this requires macro language.  Fortunately, if you know macro language it's only medium difficult.  For example (untested code):

 

%macro remove;

   proc summary data=have (firstobs=6);

     var _numeric_;

     output out=missing_counts (drop=_type_ _freq_ ) nmiss=;

   run;

   proc transpose data=missing_counts out=transposed_counts;

     var _numeric_;

   run;

   %local remove_list;

   proc sql;

       select distinct  _name_  into : remove_list from transposed_counts where col1 > 0;

   quit;

   %if %length(&remove_list) %then %do;

       data want;

           set have (drop=&remove_list);

       run;

   %end;

   %else %put No variables had missing values from observation 6 onward.;

%mend remove;

 

%remove

View solution in original post


All Replies
Super User
Posts: 19,815

Re: Removing variables in a table with a missing value, but only for certain observations

If you want row level operations, a data step is a better approach. Especially if you need to identify rows. SQL does not necessarily honor order of the rows, especially if you don't have another variable that specifies the order of the rows.

Solution
‎07-27-2017 03:30 PM
Super User
Posts: 5,509

Re: Removing variables in a table with a missing value, but only for certain observations

Unfortunately, since you may or may not need to run a step to remove variables, this requires macro language.  Fortunately, if you know macro language it's only medium difficult.  For example (untested code):

 

%macro remove;

   proc summary data=have (firstobs=6);

     var _numeric_;

     output out=missing_counts (drop=_type_ _freq_ ) nmiss=;

   run;

   proc transpose data=missing_counts out=transposed_counts;

     var _numeric_;

   run;

   %local remove_list;

   proc sql;

       select distinct  _name_  into : remove_list from transposed_counts where col1 > 0;

   quit;

   %if %length(&remove_list) %then %do;

       data want;

           set have (drop=&remove_list);

       run;

   %end;

   %else %put No variables had missing values from observation 6 onward.;

%mend remove;

 

%remove

Occasional Contributor
Posts: 15

Re: Removing variables in a table with a missing value, but only for certain observations

Posted in reply to Astounding

Thank you, this worked for me:

 

proc summary data=HAVE(firstobs=6);

var _numeric_;

output out=MISSING_COUNTS (drop=_type__freq_)nmiss=;

run;

proc transpose data=MISSING_COUNTS out=TRANSPOSED_COUNTS;

var _numeric_;

run;

proc sql;

select distinct _name_ into : REMOVE_LIST separated by " "

from TRANSPOSED_COUNTS where col1 > 0

and _NAME_ not = "_FREQ_";

quit;

data WANT(drop=&REMOVE_LIST);

set HAVE ;

run;

 

 

However, I do have to do this for 3 tables, do you know of a way to each time "update" the macro variable REMOVE_LIST as to include variables with missing values for all 3 tables?

Super User
Posts: 5,509

Re: Removing variables in a table with a missing value, but only for certain observations

This is a normal function of using macros.  The most common approach is to define the macro with parameters:

 

%macro remove (indata=, outdata=);

 

Then anywhere you are referring to either the input data or output data, make a reference to the parameter.  For example:

 

data &outdata;

set &indata;

 

Call the macro by supplying values for the parameters:

 

%remove (indata=have1, outdata=want1)

Occasional Contributor
Posts: 15

Re: Removing variables in a table with a missing value, but only for certain observations

Posted in reply to Astounding

Thanks for your help,

 

I do know how to use macro language, but I don't know how to update a macro variable without erasing the previous list of variables to remove from the first table. Ideally, I would want my "REMOVE_LIST" to include all variables to drop from all 3 tables separated by a " " without repeating them, because I then have to remove these variables from all my tables with a simple data step. Thank you for your time!

Super User
Posts: 5,509

Re: Removing variables in a table with a missing value, but only for certain observations

If I understand correctly, you want to remove a variable from one of your data sets if contains missing values in another of the data sets.

 

For that, the simplest way is to save three versions of &REMOVE_LIST:

 

%global remove_list1 remove_list2 remove_list3;

 

Then change the INTO destination, using one of those three variables instead of REMOVE_LIST.

 

Finally, just put them all together:

 

(drop=&remove_list1 &remove_list2 &remove_list3)

 

There may be some duplication in the lists, but you don't need to remove the duplication for SAS to figure out what to drop.

Occasional Contributor
Posts: 15

Re: Removing variables in a table with a missing value, but only for certain observations

Posted in reply to Astounding

Thank you for your help, I managed to write my code. Here it is for future reference or anybody that it could help

 

 

%let SCENARIOS = S0 S1 S2;

%let REMOVE_LIST = REMOVE0 REMOVE1 REMOVE2;

 

%MACRO REMOVE_VARS_W_MISSING_VALUES ( FIRST_ROW = ,

TABLES_INPUT= ,

REMOVE_LIST=

);

 

%do i=1 %to %sysfunc(countw(&TABLES_INPUT.));

proc summary data=USERLIB.%scan(&TABLES_INPUT.,&i.)(firstobs=&FIRST_ROW.);

var _NUMERIC_;

output out=MISSING_COUNTS (drop=_TYPE__FREQ_)nmiss=;

run;

proc transpose data=MISSING_COUNTS out=TRANSPOSED_COUNTS;

var _NUMERIC_;

run;

proc sql;

select distinct _NAME_ into : %scan(&REMOVE_LIST.,&i.) separated by " "

from TRANSPOSED_COUNTS where col1 > 0

and _NAME_ not = "_FREQ_";

quit;

%end;

 

%do i=1 %to %sysfunc(countw(&TABLES_INPUT.));

data USERLIB.%scan(&TABLES_INPUT.,&i.)_MOD;

set USERLIB.%scan(&TABLES_INPUT.,&i.)(drop=&REMOVE0. &REMOVE1. &REMOVE2.);

run;

%end;

%MEND;

%REMOVE_VARS_W_MISSING_VALUES ( FIRST_ROW = 6,

TABLES_INPUT= &SCENARIOS.,

REMOVE_LIST= &REMOVE_LIST.

);

 

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 7 replies
  • 316 views
  • 0 likes
  • 3 in conversation