BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
CBF
Fluorite | Level 6 CBF
Fluorite | Level 6

I am trying to take stock of the extent of missingness over multiple years for multiple subsets of variables. So, I've been working on a macro to automate this. There is a problem in my code (in blue) where I try to generate a variable that is a count of missings for each row. The counts in the resulting variable are incorrect. 

%LET STEST2 = VAR1 VAR2 VAR3... VAR35 /* &SVARS */
/* LISTN (LISTNAME) = TEST */ 

%MACRO PERCENTCOMPLETE(YEARS,SVARS,LISTN);

    %LET COUNT = %SYSFUNC(COUNTW(&YEARS));
    %DO I = 1 %TO &COUNT;
        %LET YEAR = %SCAN(&YEARS, &I);

	%PUT YEAR = &YEAR LISTN = &LISTN SVARS = &SVARS;

	DATA WORK.SVARS&YEAR&LISTN;
	    SET S2&YEAR;
		KEEP &SVARS ;
	RUN ;

	DATA WORK.MODELVARS&YEAR&LISTN;
		SET	SVARS&YEAR&LISTN;
		ARRAY VARS {*} _NUMERIC_ ; 
		NUM_VAR&YEAR&LISTN = DIM(VARS); 
	RUN ;

	DATA PERCOM&YEAR&LISTN ;
		SET MODELVARS&YEAR&LISTN ;
		
		MISSCOUNT&YEAR&LISTN = CMISS(OF _ALL_);			
MISSPER&YEAR&LISTN = MISSCOUNT&YEAR&LISTN/NUM_VAR&YEAR&LISTN; LENGTH COMOBS&YEAR&LISTN 3. ; COMOBS&YEAR&LISTN = 0 ; IF MISSCOUNT&YEAR&LISTN = 0 THEN COMOBS&YEAR&LISTN = 1 ; LENGTH EMPOBS&YEAR&LISTN 3.; EMPOBS&YEAR&LISTN = 0 ; IF MISSPER&YEAR&LISTN = 1 THEN EMPOBS&YEAR&LISTN = 1 ; RUN ; %END; %MEND ; %PERCENTCOMPLETE(2017 2018 2019 2020 2021 2022 2023, &STEST2, TEST);

I know that I can use:

MISSCOUNT&YEAR&LISTN = CMISS(OF VAR1 -- VAR35);

However, I want to be able to run this for different subsets that have different starting and ending variables. At this point, I could have just copied and pasted the code 800 times with that small edit, but I am being stubborn. All suggestions very appreciated! 

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

Aren't you making this more complicated than it needs to be?

You seem to already have the list of variable names:

%LET STEST2 = VAR1-VAR3 VAR35 ;

So use the OF keyword so you can use that macro variable.

data want;
  set have;
  nmiss = cmiss(of &stest2);
run;

View solution in original post

9 REPLIES 9
ChrisNZ
Tourmaline | Level 20

You need to initialise all the variables you create before you use CMISS(), like:

MISSCOUNT&YEAR&LISTN = 0;
... MISSCOUNT&YEAR&LISTN = CMISS(OF _ALL_);

You could just use a retain statement

retain MISSCOUNT&YEAR&LISTN 
MISSPER&YEAR&LISTN
EMPOBS&YEAR&LISTN <more variables> 0;

Note that

LENGTH EMPOBS&YEAR&LISTN 3.; 

is incorrect. Do not use a format in a LENGTH statement. Write:

length EMPOBS&YEAR&LISTN 3; 

 

quickbluefish
Barite | Level 11
Are you just trying to generate the percent missing by variable / year? (Given a specific set of variables and years)
CBF
Fluorite | Level 6 CBF
Fluorite | Level 6
No. I have a different macro for that. I am generating the percentage of observations that have complete data (no missings) for a given set of variables.

I am going to be using factor scores based on survey scores as dependent variables. In the past, when I’ve used them as covariates, I have done some imputation to retain observations with missing values. Now I am looking to revisit my measurement framework, and without imputing, I need to balance the inclusion of more variables with the loss of observations. So, I am running these percentages on various combinations of variables, and I have seven years of data.
PaigeMiller
Diamond | Level 26

I am going to be using factor scores based on survey scores as dependent variables. In the past, when I’ve used them as covariates, I have done some imputation to retain observations with missing values. Now I am looking to revisit my measurement framework, and without imputing, I need to balance the inclusion of more variables with the loss of observations. So, I am running these percentages on various combinations of variables, and I have seven years of data.

You might want to consider that SAS does have methods to handle this issue of different numbers of missings in different observations. And then all these complications of writing macros and counting the number of missings is unnecessary.

 

In particular, PROC PLS has a very sophisticated approach. Even if you don't want a PLS model, you want some other sort of model, this method will fill in the missings according to the EM algorithm, and you can work with this new data set that has no missings in whatever model you want.

 

From the PROC PLS documentation:

 

However, the MISSING= option in the PROC PLS statement provides more sophisticated ways of modeling in the presence of missing values. If you specify MISSING=AVG or MISSING=EM, then all observations in the input data set contribute to both the analysis and the OUTPUT OUT= data set. With MISSING=AVG, the fit is computed by filling in missing values with the average of the nonmissing values for the corresponding variable. With MISSING=EM, the procedure first computes the model with MISSING=AVG, then fills in missing values with their predicted values based on that model and computes the model again. Alternatively, you can specify MISSING=EM(MAXITER=n) with a large value of n in order to perform this imputation/fit loop until convergence.

--
Paige Miller
mkeintz
PROC Star

@CBF wrote:

I am trying to take stock of the extent of missingness over multiple years for multiple subsets of variables. So, I've been working on a macro to automate this. There is a problem in my code (in blue) where I try to generate a variable that is a count of missings for each row. The counts in the resulting variable are incorrect. 
   ,,,,

However, I want to be able to run this for different subsets that have different starting and ending variables. At this point, I could have just copied and pasted the code 800 times with that small edit, but I am being stubborn. All suggestions very appreciated! 


AS @ChrisNZ has pointed out, you have to be sure not to mistakenly include the newly created variables in the CMISS function (which would happen using NMISS=CMISS(of _ALL_).

 

One way to deal with this is to generate two SENTINEL variables, one immediately to the left of your var list, and one immediately to the right.  Then use then as list boundaries in the expression  _left_sentinel -- _right_sentinel .

 

Something like:

 

data want (drop=_:);
  retain _left_sentinel .;
  set mydata;
  retain _right_sentinel .;

  if _n_=1 then do;
    call missing(of _all_);
    nvars=cmiss(of _left_sentinel -- _right_sentinel)-2;
    set mydata;  /*Repopulate the variables */
    _left_sentinel=1;
    _right_sentinel=1;
  end;
  retain nvars;
  nmiss= cmiss(of _left_sentinel -- _right_sentinel);
  pct_miss=nmiss/nvars;

run;

The point is, don't identify any new variables until after _right_sentinel has been named.

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Ksharp
Super User
That would be better to clarify your question if you could post some data and desired output.
Tom
Super User Tom
Super User

Aren't you making this more complicated than it needs to be?

You seem to already have the list of variable names:

%LET STEST2 = VAR1-VAR3 VAR35 ;

So use the OF keyword so you can use that macro variable.

data want;
  set have;
  nmiss = cmiss(of &stest2);
run;
CBF
Fluorite | Level 6 CBF
Fluorite | Level 6

Thank you. I tried this approach but omitted "OF." Appreciate it!

Tom
Super User Tom
Super User

Are you just having a hard time counting how many variables are in a variable list like FIRSTVAR -- LASTVAR ?

 

Here is one trick you could use.

proc transpose data=have(obs=0) out=names;
  var &svars;
run;
proc sql noprint;
   select count(*) format=32. into :numvars trimmed from names;
quit;

hackathon24-white-horiz.png

The 2025 SAS Hackathon Kicks Off on June 11!

Watch the live Hackathon Kickoff to get all the essential information about the SAS Hackathon—including how to join, how to participate, and expert tips for success.

YouTube LinkedIn

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 9 replies
  • 1581 views
  • 2 likes
  • 7 in conversation