Automate code

Satori · Posted 03-31-2023 04:13 PM

my code:

proc means noprint data=bv31 n; output out=counts(drop=_type_ _freq_) n=;

data high; set counts; array vars(*) _numeric_; max = max(of vars(*));
do i = 1 to dim(vars);
  if vars(i) = max then do;
  varname = vname(vars(i));
  countval = vars(i);
  output;
  end;
end;
keep varname countval;

data test; set high; call symputx('varname', varname);

data dyna; set bvd.bv31; if missing (&varname) then delete; drop &varname;

How can I automate this code to do the following:

1) run the proc means precedure again (now with one variable less)

2) select the variable with the highest number of non-missing observations, and add it to the dataset high (add the var name and its value)

3) remove all missing observations for last variable added to the high dataset, and drop the variable

4) then continue doing this until there are no variables left.

sample data:

data have;
input obs var1 var2 var3 var4 var5 var6 $1.;
cards;
1  5 0 5 0 3 7
2  1 0 1 0 . 1
3  7 1 7 0 3 1
4  0 . . . 2 0
5  4 0 0 4 1 6
6  4 0 7 3 7 2
7  5 0 8 5 1 0
8  1 7 5 4 9 3
9  2 0 0 2 1 0
10 0 . . . 3 0
11 4 0 0 4 1 0
12 1 1 3 2 . 6
13 3 1 1 6 1 1
14 0 . . . 9 0
15 2 1 2 4 1 0
16 4 1 4 4 1 0
17 . 1 1 4 1 0
18 2 0 0 2 1 0
19 1 1 1 1 2 1
20 0 . . . 7 0
21 0 3 1 . 1 0
22 2 0 2 0 2 0
23 0 . . . 2 0
24 0 . . . 5 0
25 9 0 8 1 1 0
26 0 . . . 1 0
27 0 . 2 . 1 0
28 2 0 2 0 6 2
29 2 0 2 0 4 1
30 2 0 0 2 2 0
;

Reeza · Posted 03-31-2023 04:29 PM

Please provide some fake data for testing.

Satori · Posted 03-31-2023 05:28 PM

fake date added

Reeza · Posted 03-31-2023 05:42 PM

You're aware that your code description and requirements/task list don't align? Which one is correct, the code or your requirements?

Satori · Posted 03-31-2023 06:17 PM

Requirements

Reeza · Posted 03-31-2023 05:59 PM

Based on your code posted, this the requested solution.

You need to loop n-1 otherwise you drop the last variable and have an empty data set at the end, unless that's what you want.

data have;
input obs var1 var2 var3 var4 var5 var6 $1.;
cards;
1  5 0 5 0 3 7
2  1 0 1 0 . 1
3  7 1 7 0 3 1
4  0 . . . 2 0
5  4 0 0 4 1 6
6  4 0 7 3 7 2
7  5 0 8 5 1 0
8  1 7 5 4 9 3
9  2 0 0 2 1 0
10 0 . . . 3 0
11 4 0 0 4 1 0
12 1 1 3 2 . 6
13 3 1 1 6 1 1
14 0 . . . 9 0
15 2 1 2 4 1 0
16 4 1 4 4 1 0
17 . 1 1 4 1 0
18 2 0 0 2 1 0
19 1 1 1 1 2 1
20 0 . . . 7 0
21 0 3 1 . 1 0
22 2 0 2 0 2 0
23 0 . . . 2 0
24 0 . . . 5 0
25 9 0 8 1 1 0
26 0 . . . 1 0
27 0 . 2 . 1 0
28 2 0 2 0 6 2
29 2 0 2 0 4 1
30 2 0 0 2 2 0
;

option mprint symbolgen;

%let nvars = 6;

%macro iterate(dsnin=);
ods select none;
proc means data=&dsnin. n stackods; 
ods output summary = counts;
run;
ods select all;

proc sort data=counts;
by descending n;
run;

data highest;
set counts(obs=1);
call symputx('varname', variable);
run;


data &dsnin.; 
set &dsnin.; 
if missing (&varname) then delete; 
drop &varname;
run;

%mend;

%macro loop(n=);
%do i=1 %to &n;
%iterate(dsnin=have);
%end;
%mend;

%loop(n=&nvars.);

Satori · Posted 04-01-2023 05:22 AM

Using this code, how can I store the variable names and its values, before they are deleted, so that in the end of the process I have a table with two columns, the first column being the variable name and the second column being its value before deletion?

Tom · Posted 04-01-2023 01:20 PM

@Satori wrote:
Using this code, how can I store the variable names and its values, before they are deleted, so that in the end of the process I have a table with two columns, the first column being the variable name and the second column being its value before deletion?

That does not make any sense. A variable has MANY values. Each observation in a dataset as a value for every variable in the dataset.

Reeza · Posted 04-03-2023 11:11 AM

What "values"?

Reeza · Posted 04-03-2023 11:54 AM

data have;
input obs var1 var2 var3 var4 var5 var6 $1.;
cards;
1  5 0 5 0 3 7
2  1 0 1 0 . 1
3  7 1 7 0 3 1
4  0 . . . 2 0
5  4 0 0 4 1 6
6  4 0 7 3 7 2
7  5 0 8 5 1 0
8  1 7 5 4 9 3
9  2 0 0 2 1 0
10 0 . . . 3 0
11 4 0 0 4 1 0
12 1 1 3 2 . 6
13 3 1 1 6 1 1
14 0 . . . 9 0
15 2 1 2 4 1 0
16 4 1 4 4 1 0
17 . 1 1 4 1 0
18 2 0 0 2 1 0
19 1 1 1 1 2 1
20 0 . . . 7 0
21 0 3 1 . 1 0
22 2 0 2 0 2 0
23 0 . . . 2 0
24 0 . . . 5 0
25 9 0 8 1 1 0
26 0 . . . 1 0
27 0 . 2 . 1 0
28 2 0 2 0 6 2
29 2 0 2 0 4 1
30 2 0 0 2 2 0
;

option mprint symbolgen;

%let nvars = 6;

*drop table of variable list before run;
proc sql;
drop table highest_list;
quit;

%macro iterate(dsnin=);
ods select none;
proc means data=&dsnin. n stackods; 
ods output summary = counts;
run;
ods select all;

proc sort data=counts;
by descending n;
run;

data highest;
set counts(obs=1);
call symputx('varname', variable);
run;

proc append base = drop_list data=highest force;
run;

data &dsnin.; 
set &dsnin.; 
if missing (&varname) then delete; 
drop &varname;
run;

%mend;

%macro loop(n=);
%do i=1 %to &n;
%iterate(dsnin=have);
%end;
%mend;

%loop(n=&nvars.);

Assuming you mean the # of non missing observation as your 'values' see above.

You really aren't tracking much in this process so I'm assuming it's a skeleton code that you're going to expand otherwise, it doesn't seem to be providing much information at all.

PaigeMiller · Posted 03-31-2023 05:07 PM

Weeks have gone by since you started discussing this problem, and you are still trying to eliminate variables that have lots of missings, except now you have rephrased the problem to keep variables that have lots of non-missings.

You have avoided (apparently) other suggestions that might work here, such as @Reeza's suggestion to use the missing pattern report from PROC MI, and my suggestion about fitting models when there is lots of missing data. https://communities.sas.com/t5/SAS-Programming/write-macro-or-automate-process/m-p/863200#M340998

People give you advice here because it appears you are working very hard to come up with a not very good solution. We want to show you a smarter path. We know that smart people have worked on the problem of missing values, and come up with some smart solutions. You should probably use those smart solutions instead of inventing your own.

--
Paige Miller

Satori · Posted 03-31-2023 05:41 PM

Honestly, I don't understand why you keep replying if you don't have any intention of helping. This issue has nothing to do with smart solutions. I have explained before that it has to be done this way, if nothing else, because that is what my boss wants me to do. Also, you don't have the full picture (and neither do I by the way), but I know at least that the final goal has nothing to do with eliminating variables with lots of missings. The idea for now is just to see how the number of non-missing values decreases as variables are added.

So please if don't want to help me do this, I would appreciate it if you refrain from posting patronizing speeches because that does not help me at all.

ballardw · Posted 03-31-2023 06:52 PM

Suggestion:

ACTUALLY walk through, showing the result of each manual step, NOT just the "highest count added" to a set. Show the actually removal of observations, the new set and then followup, for all the variables in your example data.

I am not sure I exactly which observations, variables or values are deleted or dropped.

I also wonder about rules for tie breaking. What if 3 variables have the same number of non-missing (or missing ) values (especially if on different observations) at the first pass. What rule is to be used for selecting the variable? Or is more than one variable to be selected?

This may be moderately critical in one way of determining variable and removing observations if these tied variables are often having the missing on the same observations it is quite possible that of 3 variables tied for most missing in the first pass that none of them have the most missing on the second pass.

I get from interpretting "3) remove all missing observations for last variable added to the high dataset, and drop the variable" as "remove the observation with missing values for the selected variable". Because you cannot have any "missing observations" in a SAS data set.

Tom · Posted 04-01-2023 12:30 PM

So for NUMERIC variables you can use PROC MEANS to calculate the N statistic to count the number of non-missing values.

proc summary data=have;
  output out=statistics(drop=_type_ _freq_) N(_numeric_)= ;
run;

Now you can use the MAX() function to find the maximum value of those N statistics. And then the WHICHN() function to determine which of the variables has that value (this is take the first one with that count). Once you that index number you can use it with the VNAME() function to find the NAME of the variable. You can use the NLITERAL() function to convert the name to a string you could use in SAS code to refer to the variable. You can use CALL SYMPUTX() to write the name into a macro variable. While we are at it let's see how many variables we have left after we drop the selected variable.

data _null_;
  set statistics;
  array __n _numeric_ ;
  call symputx("next",nliteral(vname(__n[whichn(MAX(of __n[*]),of __n[*])])));
  call symputx('nvars',dim(__n)-1);
run;

Now that we know the name of the variable to drop we can use it to make a new copy of the dataset without the observations that have missing values of that variable, and also remove that variable. While we are at it let's also see how many observations are now left. There no sense continuing if there are no more observations.

data want(drop=&next);
  if eof then call symputx('nobs',_n_-1);
  set want end=eof;
  where not missing(&next);
run;

We can then put it together in a macro definition so that we can loop until we run out of observations to delete or variables to drop.

But you have asked this question multiple times using different selection criteria, so let's make more parameters to the macro to control that also. We could use the NMISS() function instead of the N() function with PROC SUMMARY. (or really any statistic that PROC MEANS can calculate). And we could select the variable with MIN() value instead of the MAX() value. So let's add the STAT parameter and the CRITERIA parameter to the macro definition.

Here is the result:

%macro drop_vars
(inds /* input dataset name */
,outds /* output dataset name */
,mvar  /* macro variable to store the dropped variable */
,stat=N /* What statistic to calculate */
,criteria=max /* What criteria to use for picking variable */
);
%local next nvars nobs;
%if not %symexist(&mvar) %then %global &mvar;
%let &mvar=;
data &outds;
 set &inds;
run;

%do %until(0=&nvars or 0=&nobs);

proc summary data=&outds ;
  output out=statistics(drop=_type_ _freq_) &stat(_numeric_)= ;
run;
data _null_;
  set statistics;
  array __n _numeric_ ;
  call symputx("next",nliteral(vname(__n[whichn(&criteria(of __n[*]),of __n[*])])));
  call symputx('nvars',dim(__n)-1);
run;
%let &mvar=&&&mvar &next ;
data &outds(drop=&next);
  if eof then call symputx('nobs',_n_-1);
  set &outds end=eof;
  where not missing(&next);
run;
%end;
%mend drop_vars;

If we run it for your test dataset (click spoiler to expand);

Spoiler

options mprint nosymbolgen;
%drop_vars(inds=have,outds=want,mvar=varlist)
%put &=varlist;

we get this log:

4103  options mprint nosymbolgen;
4104  %drop_vars(inds=have,outds=want,mvar=varlist)
MPRINT(DROP_VARS):   data want;
MPRINT(DROP_VARS):   set have;
MPRINT(DROP_VARS):   run;

NOTE: The data set WORK.WANT has 30 observations and 7 variables.
NOTE: DATA statement used (Total process time):
      real time           0.01 seconds
      cpu time            0.00 seconds


MPRINT(DROP_VARS):   proc summary data=want ;
MPRINT(DROP_VARS):   output out=statistics(drop=_type_ _freq_) N(_numeric_)= ;
MPRINT(DROP_VARS):   run;

NOTE: The data set WORK.STATISTICS has 1 observations and 6 variables.
NOTE: PROCEDURE SUMMARY used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds


MPRINT(DROP_VARS):   data _null_;
MPRINT(DROP_VARS):   set statistics;
MPRINT(DROP_VARS):   array __n _numeric_ ;
MPRINT(DROP_VARS):   call symputx("next",nliteral(vname(__n[whichn(max(of __n[*]),of __n[*])])));
MPRINT(DROP_VARS):   call symputx('nvars',dim(__n)-1);
MPRINT(DROP_VARS):   run;

NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds


MPRINT(DROP_VARS):   data want(drop=obs);
MPRINT(DROP_VARS):   if eof then call symputx('nobs',_n_-1);
MPRINT(DROP_VARS):   set want end=eof;
MPRINT(DROP_VARS):   where not missing(obs);
MPRINT(DROP_VARS):   run;

NOTE: The data set WORK.WANT has 30 observations and 6 variables.
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds


MPRINT(DROP_VARS):   proc summary data=want ;
MPRINT(DROP_VARS):   output out=statistics(drop=_type_ _freq_) N(_numeric_)= ;
MPRINT(DROP_VARS):   run;

NOTE: The data set WORK.STATISTICS has 1 observations and 5 variables.
NOTE: PROCEDURE SUMMARY used (Total process time):
      real time           0.01 seconds
      cpu time            0.00 seconds


MPRINT(DROP_VARS):   data _null_;
MPRINT(DROP_VARS):   set statistics;
MPRINT(DROP_VARS):   array __n _numeric_ ;
MPRINT(DROP_VARS):   call symputx("next",nliteral(vname(__n[whichn(max(of __n[*]),of __n[*])])));
MPRINT(DROP_VARS):   call symputx('nvars',dim(__n)-1);
MPRINT(DROP_VARS):   run;

NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds


MPRINT(DROP_VARS):   data want(drop=var1);
MPRINT(DROP_VARS):   if eof then call symputx('nobs',_n_-1);
MPRINT(DROP_VARS):   set want end=eof;
MPRINT(DROP_VARS):   where not missing(var1);
MPRINT(DROP_VARS):   run;

NOTE: The data set WORK.WANT has 29 observations and 5 variables.
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds


MPRINT(DROP_VARS):   proc summary data=want ;
MPRINT(DROP_VARS):   output out=statistics(drop=_type_ _freq_) N(_numeric_)= ;
MPRINT(DROP_VARS):   run;

NOTE: The data set WORK.STATISTICS has 1 observations and 4 variables.
NOTE: PROCEDURE SUMMARY used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds


MPRINT(DROP_VARS):   data _null_;
MPRINT(DROP_VARS):   set statistics;
MPRINT(DROP_VARS):   array __n _numeric_ ;
MPRINT(DROP_VARS):   call symputx("next",nliteral(vname(__n[whichn(max(of __n[*]),of __n[*])])));
MPRINT(DROP_VARS):   call symputx('nvars',dim(__n)-1);
MPRINT(DROP_VARS):   run;

NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds


MPRINT(DROP_VARS):   data want(drop=var5);
MPRINT(DROP_VARS):   if eof then call symputx('nobs',_n_-1);
MPRINT(DROP_VARS):   set want end=eof;
MPRINT(DROP_VARS):   where not missing(var5);
MPRINT(DROP_VARS):   run;

NOTE: The data set WORK.WANT has 27 observations and 4 variables.
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds


MPRINT(DROP_VARS):   proc summary data=want ;
MPRINT(DROP_VARS):   output out=statistics(drop=_type_ _freq_) N(_numeric_)= ;
MPRINT(DROP_VARS):   run;

NOTE: The data set WORK.STATISTICS has 1 observations and 3 variables.
NOTE: PROCEDURE SUMMARY used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds


MPRINT(DROP_VARS):   data _null_;
MPRINT(DROP_VARS):   set statistics;
MPRINT(DROP_VARS):   array __n _numeric_ ;
MPRINT(DROP_VARS):   call symputx("next",nliteral(vname(__n[whichn(max(of __n[*]),of __n[*])])));
MPRINT(DROP_VARS):   call symputx('nvars',dim(__n)-1);
MPRINT(DROP_VARS):   run;

NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.01 seconds


MPRINT(DROP_VARS):   data want(drop=var3);
MPRINT(DROP_VARS):   if eof then call symputx('nobs',_n_-1);
MPRINT(DROP_VARS):   set want end=eof;
MPRINT(DROP_VARS):   where not missing(var3);
MPRINT(DROP_VARS):   run;

NOTE: The data set WORK.WANT has 20 observations and 3 variables.
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds


MPRINT(DROP_VARS):   proc summary data=want ;
MPRINT(DROP_VARS):   output out=statistics(drop=_type_ _freq_) N(_numeric_)= ;
MPRINT(DROP_VARS):   run;

NOTE: The data set WORK.STATISTICS has 1 observations and 2 variables.
NOTE: PROCEDURE SUMMARY used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds


MPRINT(DROP_VARS):   data _null_;
MPRINT(DROP_VARS):   set statistics;
MPRINT(DROP_VARS):   array __n _numeric_ ;
MPRINT(DROP_VARS):   call symputx("next",nliteral(vname(__n[whichn(max(of __n[*]),of __n[*])])));
MPRINT(DROP_VARS):   call symputx('nvars',dim(__n)-1);
MPRINT(DROP_VARS):   run;

NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds


MPRINT(DROP_VARS):   data want(drop=var2);
MPRINT(DROP_VARS):   if eof then call symputx('nobs',_n_-1);
MPRINT(DROP_VARS):   set want end=eof;
MPRINT(DROP_VARS):   where not missing(var2);
MPRINT(DROP_VARS):   run;

NOTE: Compression was disabled for data set WORK.WANT because compression overhead would increase the size of the data set.
NOTE: The data set WORK.WANT has 19 observations and 2 variables.
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.01 seconds


MPRINT(DROP_VARS):   proc summary data=want ;
MPRINT(DROP_VARS):   output out=statistics(drop=_type_ _freq_) N(_numeric_)= ;
MPRINT(DROP_VARS):   run;

NOTE: Compression was disabled for data set WORK.STATISTICS because compression overhead would increase the size of the data set.
NOTE: The data set WORK.STATISTICS has 1 observations and 1 variables.
NOTE: PROCEDURE SUMMARY used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds


MPRINT(DROP_VARS):   data _null_;
MPRINT(DROP_VARS):   set statistics;
MPRINT(DROP_VARS):   array __n _numeric_ ;
MPRINT(DROP_VARS):   call symputx("next",nliteral(vname(__n[whichn(max(of __n[*]),of __n[*])])));
MPRINT(DROP_VARS):   call symputx('nvars',dim(__n)-1);
MPRINT(DROP_VARS):   run;

NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds


MPRINT(DROP_VARS):   data want(drop=var4);
MPRINT(DROP_VARS):   if eof then call symputx('nobs',_n_-1);
MPRINT(DROP_VARS):   set want end=eof;
MPRINT(DROP_VARS):   where not missing(var4);
MPRINT(DROP_VARS):   run;

NOTE: Compression was disabled for data set WORK.WANT because compression overhead would increase the size of the data set.
NOTE: The data set WORK.WANT has 18 observations and 1 variables.
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds


4105  %put &=varlist;
VARLIST=obs var1 var5 var3 var2 var4

options mprint nosymbolgen; %drop_vars(inds=have,outds=want,mvar=varlist) %put &=varlist; we get this log: 4103 options mprint nosymbolgen; 4104 %drop_vars(inds=have,outds=want,mvar=varlist) MPRINT(DROP_VARS): data want; MPRINT(DROP_VARS): set have; MPRINT(DROP_VARS): run; NOTE: The data set WORK.WANT has 30 observations and 7 variables. NOTE: DATA statement used (Total process time): real time 0.01 seconds cpu time 0.00 seconds MPRINT(DROP_VARS): proc summary data=want ; MPRINT(DROP_VARS): output out=statistics(drop=_type_ _freq_) N(_numeric_)= ; MPRINT(DROP_VARS): run; NOTE: The data set WORK.STATISTICS has 1 observations and 6 variables. NOTE: PROCEDURE SUMMARY used (Total process time): real time 0.00 seconds cpu time 0.00 seconds MPRINT(DROP_VARS): data _null_; MPRINT(DROP_VARS): set statistics; MPRINT(DROP_VARS): array __n _numeric_ ; MPRINT(DROP_VARS): call symputx("next",nliteral(vname(__n[whichn(max(of __n[*]),of __n[*])]))); MPRINT(DROP_VARS): call symputx('nvars',dim(__n)-1); MPRINT(DROP_VARS): run; NOTE: DATA statement used (Total process time): real time 0.00 seconds cpu time 0.00 seconds MPRINT(DROP_VARS): data want(drop=obs); MPRINT(DROP_VARS): if eof then call symputx('nobs',_n_-1); MPRINT(DROP_VARS): set want end=eof; MPRINT(DROP_VARS): where not missing(obs); MPRINT(DROP_VARS): run; NOTE: The data set WORK.WANT has 30 observations and 6 variables. NOTE: DATA statement used (Total process time): real time 0.00 seconds cpu time 0.00 seconds MPRINT(DROP_VARS): proc summary data=want ; MPRINT(DROP_VARS): output out=statistics(drop=_type_ _freq_) N(_numeric_)= ; MPRINT(DROP_VARS): run; NOTE: The data set WORK.STATISTICS has 1 observations and 5 variables. NOTE: PROCEDURE SUMMARY used (Total process time): real time 0.01 seconds cpu time 0.00 seconds MPRINT(DROP_VARS): data _null_; MPRINT(DROP_VARS): set statistics; MPRINT(DROP_VARS): array __n _numeric_ ; MPRINT(DROP_VARS): call symputx("next",nliteral(vname(__n[whichn(max(of __n[*]),of __n[*])]))); MPRINT(DROP_VARS): call symputx('nvars',dim(__n)-1); MPRINT(DROP_VARS): run; NOTE: DATA statement used (Total process time): real time 0.00 seconds cpu time 0.00 seconds MPRINT(DROP_VARS): data want(drop=var1); MPRINT(DROP_VARS): if eof then call symputx('nobs',_n_-1); MPRINT(DROP_VARS): set want end=eof; MPRINT(DROP_VARS): where not missing(var1); MPRINT(DROP_VARS): run; NOTE: The data set WORK.WANT has 29 observations and 5 variables. NOTE: DATA statement used (Total process time): real time 0.00 seconds cpu time 0.00 seconds MPRINT(DROP_VARS): proc summary data=want ; MPRINT(DROP_VARS): output out=statistics(drop=_type_ _freq_) N(_numeric_)= ; MPRINT(DROP_VARS): run; NOTE: The data set WORK.STATISTICS has 1 observations and 4 variables. NOTE: PROCEDURE SUMMARY used (Total process time): real time 0.00 seconds cpu time 0.00 seconds MPRINT(DROP_VARS): data _null_; MPRINT(DROP_VARS): set statistics; MPRINT(DROP_VARS): array __n _numeric_ ; MPRINT(DROP_VARS): call symputx("next",nliteral(vname(__n[whichn(max(of __n[*]),of __n[*])]))); MPRINT(DROP_VARS): call symputx('nvars',dim(__n)-1); MPRINT(DROP_VARS): run; NOTE: DATA statement used (Total process time): real time 0.00 seconds cpu time 0.00 seconds MPRINT(DROP_VARS): data want(drop=var5); MPRINT(DROP_VARS): if eof then call symputx('nobs',_n_-1); MPRINT(DROP_VARS): set want end=eof; MPRINT(DROP_VARS): where not missing(var5); MPRINT(DROP_VARS): run; NOTE: The data set WORK.WANT has 27 observations and 4 variables. NOTE: DATA statement used (Total process time): real time 0.00 seconds cpu time 0.00 seconds MPRINT(DROP_VARS): proc summary data=want ; MPRINT(DROP_VARS): output out=statistics(drop=_type_ _freq_) N(_numeric_)= ; MPRINT(DROP_VARS): run; NOTE: The data set WORK.STATISTICS has 1 observations and 3 variables. NOTE: PROCEDURE SUMMARY used (Total process time): real time 0.00 seconds cpu time 0.00 seconds MPRINT(DROP_VARS): data _null_; MPRINT(DROP_VARS): set statistics; MPRINT(DROP_VARS): array __n _numeric_ ; MPRINT(DROP_VARS): call symputx("next",nliteral(vname(__n[whichn(max(of __n[*]),of __n[*])]))); MPRINT(DROP_VARS): call symputx('nvars',dim(__n)-1); MPRINT(DROP_VARS): run; NOTE: DATA statement used (Total process time): real time 0.00 seconds cpu time 0.01 seconds MPRINT(DROP_VARS): data want(drop=var3); MPRINT(DROP_VARS): if eof then call symputx('nobs',_n_-1); MPRINT(DROP_VARS): set want end=eof; MPRINT(DROP_VARS): where not missing(var3); MPRINT(DROP_VARS): run; NOTE: The data set WORK.WANT has 20 observations and 3 variables. NOTE: DATA statement used (Total process time): real time 0.00 seconds cpu time 0.00 seconds MPRINT(DROP_VARS): proc summary data=want ; MPRINT(DROP_VARS): output out=statistics(drop=_type_ _freq_) N(_numeric_)= ; MPRINT(DROP_VARS): run; NOTE: The data set WORK.STATISTICS has 1 observations and 2 variables. NOTE: PROCEDURE SUMMARY used (Total process time): real time 0.00 seconds cpu time 0.00 seconds MPRINT(DROP_VARS): data _null_; MPRINT(DROP_VARS): set statistics; MPRINT(DROP_VARS): array __n _numeric_ ; MPRINT(DROP_VARS): call symputx("next",nliteral(vname(__n[whichn(max(of __n[*]),of __n[*])]))); MPRINT(DROP_VARS): call symputx('nvars',dim(__n)-1); MPRINT(DROP_VARS): run; NOTE: DATA statement used (Total process time): real time 0.00 seconds cpu time 0.00 seconds MPRINT(DROP_VARS): data want(drop=var2); MPRINT(DROP_VARS): if eof then call symputx('nobs',_n_-1); MPRINT(DROP_VARS): set want end=eof; MPRINT(DROP_VARS): where not missing(var2); MPRINT(DROP_VARS): run; NOTE: Compression was disabled for data set WORK.WANT because compression overhead would increase the size of the data set. NOTE: The data set WORK.WANT has 19 observations and 2 variables. NOTE: DATA statement used (Total process time): real time 0.00 seconds cpu time 0.01 seconds MPRINT(DROP_VARS): proc summary data=want ; MPRINT(DROP_VARS): output out=statistics(drop=_type_ _freq_) N(_numeric_)= ; MPRINT(DROP_VARS): run; NOTE: Compression was disabled for data set WORK.STATISTICS because compression overhead would increase the size of the data set. NOTE: The data set WORK.STATISTICS has 1 observations and 1 variables. NOTE: PROCEDURE SUMMARY used (Total process time): real time 0.00 seconds cpu time 0.00 seconds MPRINT(DROP_VARS): data _null_; MPRINT(DROP_VARS): set statistics; MPRINT(DROP_VARS): array __n _numeric_ ; MPRINT(DROP_VARS): call symputx("next",nliteral(vname(__n[whichn(max(of __n[*]),of __n[*])]))); MPRINT(DROP_VARS): call symputx('nvars',dim(__n)-1); MPRINT(DROP_VARS): run; NOTE: DATA statement used (Total process time): real time 0.00 seconds cpu time 0.00 seconds MPRINT(DROP_VARS): data want(drop=var4); MPRINT(DROP_VARS): if eof then call symputx('nobs',_n_-1); MPRINT(DROP_VARS): set want end=eof; MPRINT(DROP_VARS): where not missing(var4); MPRINT(DROP_VARS): run; NOTE: Compression was disabled for data set WORK.WANT because compression overhead would increase the size of the data set. NOTE: The data set WORK.WANT has 18 observations and 1 variables. NOTE: DATA statement used (Total process time): real time 0.00 seconds cpu time 0.00 seconds 4105 %put &=varlist; VARLIST=obs var1 var5 var3 var2 var4

So you can see that the variables were dropped in this order:

VARLIST=obs var1 var5 var3 var2 var4

If instead we run it to drop the variable with most missing values:

%drop_vars(inds=have,outds=want,mvar=varlist,stat=nmiss)

We get the variables dropped in this order:

VARLIST=var4 var5 var1 obs var2 var3

Automate code

Re: Automate code

Re: Automate code

Re: Automate code

Re: Automate code

Re: Automate code

Re: Automate code

Re: Automate code

Re: Automate code

Re: Automate code

Re: Automate code

Re: Automate code

Re: Automate code

Re: Automate code

SAS Innovate 2025: Call for Content

Classroom Training Available!