I am trying to take stock of the extent of missingness over multiple years for multiple subsets of variables. So, I've been working on a macro to automate this. There is a problem in my code (in blue) where I try to generate a variable that is a count of missings for each row. The counts in the resulting variable are incorrect.
%LET STEST2 = VAR1 VAR2 VAR3... VAR35 /* &SVARS */
/* LISTN (LISTNAME) = TEST */
%MACRO PERCENTCOMPLETE(YEARS,SVARS,LISTN);
%LET COUNT = %SYSFUNC(COUNTW(&YEARS));
%DO I = 1 %TO &COUNT;
%LET YEAR = %SCAN(&YEARS, &I);
%PUT YEAR = &YEAR LISTN = &LISTN SVARS = &SVARS;
DATA WORK.SVARS&YEAR&LISTN;
SET S2&YEAR;
KEEP &SVARS ;
RUN ;
DATA WORK.MODELVARS&YEAR&LISTN;
SET SVARS&YEAR&LISTN;
ARRAY VARS {*} _NUMERIC_ ;
NUM_VAR&YEAR&LISTN = DIM(VARS);
RUN ;
DATA PERCOM&YEAR&LISTN ;
SET MODELVARS&YEAR&LISTN ;
MISSCOUNT&YEAR&LISTN = CMISS(OF _ALL_);
MISSPER&YEAR&LISTN = MISSCOUNT&YEAR&LISTN/NUM_VAR&YEAR&LISTN;
LENGTH COMOBS&YEAR&LISTN 3. ;
COMOBS&YEAR&LISTN = 0 ;
IF MISSCOUNT&YEAR&LISTN = 0 THEN COMOBS&YEAR&LISTN = 1 ;
LENGTH EMPOBS&YEAR&LISTN 3.;
EMPOBS&YEAR&LISTN = 0 ;
IF MISSPER&YEAR&LISTN = 1 THEN EMPOBS&YEAR&LISTN = 1 ;
RUN ;
%END;
%MEND ;
%PERCENTCOMPLETE(2017 2018 2019 2020 2021 2022 2023, &STEST2, TEST);
I know that I can use:
MISSCOUNT&YEAR&LISTN = CMISS(OF VAR1 -- VAR35);
However, I want to be able to run this for different subsets that have different starting and ending variables. At this point, I could have just copied and pasted the code 800 times with that small edit, but I am being stubborn. All suggestions very appreciated!
Aren't you making this more complicated than it needs to be?
You seem to already have the list of variable names:
%LET STEST2 = VAR1-VAR3 VAR35 ;
So use the OF keyword so you can use that macro variable.
data want;
set have;
nmiss = cmiss(of &stest2);
run;
You need to initialise all the variables you create before you use CMISS()
, like:
MISSCOUNT&YEAR&LISTN = 0;
...
MISSCOUNT&YEAR&LISTN = CMISS(OF _ALL_);
You could just use a retain statement
retain MISSCOUNT&YEAR&LISTN
MISSPER&YEAR&LISTN
EMPOBS&YEAR&LISTN <more variables> 0;
Note that
LENGTH EMPOBS&YEAR&LISTN 3.;
is incorrect. Do not use a format in a LENGTH
statement. Write:
length EMPOBS&YEAR&LISTN 3;
I am going to be using factor scores based on survey scores as dependent variables. In the past, when I’ve used them as covariates, I have done some imputation to retain observations with missing values. Now I am looking to revisit my measurement framework, and without imputing, I need to balance the inclusion of more variables with the loss of observations. So, I am running these percentages on various combinations of variables, and I have seven years of data.
You might want to consider that SAS does have methods to handle this issue of different numbers of missings in different observations. And then all these complications of writing macros and counting the number of missings is unnecessary.
In particular, PROC PLS has a very sophisticated approach. Even if you don't want a PLS model, you want some other sort of model, this method will fill in the missings according to the EM algorithm, and you can work with this new data set that has no missings in whatever model you want.
From the PROC PLS documentation:
However, the MISSING= option in the PROC PLS statement provides more sophisticated ways of modeling in the presence of missing values. If you specify MISSING=AVG or MISSING=EM, then all observations in the input data set contribute to both the analysis and the OUTPUT OUT= data set. With MISSING=AVG, the fit is computed by filling in missing values with the average of the nonmissing values for the corresponding variable. With MISSING=EM, the procedure first computes the model with MISSING=AVG, then fills in missing values with their predicted values based on that model and computes the model again. Alternatively, you can specify MISSING=EM(MAXITER=n) with a large value of n in order to perform this imputation/fit loop until convergence.
@CBF wrote:
I am trying to take stock of the extent of missingness over multiple years for multiple subsets of variables. So, I've been working on a macro to automate this. There is a problem in my code (in blue) where I try to generate a variable that is a count of missings for each row. The counts in the resulting variable are incorrect.
,,,,However, I want to be able to run this for different subsets that have different starting and ending variables. At this point, I could have just copied and pasted the code 800 times with that small edit, but I am being stubborn. All suggestions very appreciated!
AS @ChrisNZ has pointed out, you have to be sure not to mistakenly include the newly created variables in the CMISS function (which would happen using NMISS=CMISS(of _ALL_).
One way to deal with this is to generate two SENTINEL variables, one immediately to the left of your var list, and one immediately to the right. Then use then as list boundaries in the expression _left_sentinel -- _right_sentinel .
Something like:
data want (drop=_:);
retain _left_sentinel .;
set mydata;
retain _right_sentinel .;
if _n_=1 then do;
call missing(of _all_);
nvars=cmiss(of _left_sentinel -- _right_sentinel)-2;
set mydata; /*Repopulate the variables */
_left_sentinel=1;
_right_sentinel=1;
end;
retain nvars;
nmiss= cmiss(of _left_sentinel -- _right_sentinel);
pct_miss=nmiss/nvars;
run;
The point is, don't identify any new variables until after _right_sentinel has been named.
Aren't you making this more complicated than it needs to be?
You seem to already have the list of variable names:
%LET STEST2 = VAR1-VAR3 VAR35 ;
So use the OF keyword so you can use that macro variable.
data want;
set have;
nmiss = cmiss(of &stest2);
run;
Thank you. I tried this approach but omitted "OF." Appreciate it!
Are you just having a hard time counting how many variables are in a variable list like FIRSTVAR -- LASTVAR ?
Here is one trick you could use.
proc transpose data=have(obs=0) out=names;
var &svars;
run;
proc sql noprint;
select count(*) format=32. into :numvars trimmed from names;
quit;
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.