BookmarkSubscribeRSS Feed
ARLewis
Calcite | Level 5
Hello, I have a survey with missing data that I would like to impute, however included in this file are -8 values which indicate valid skips, which I do not want to include in the imputation or other descriptive statistics.

I can't seem to find a parameter to the imputation code (proc mi or proc mcmc) or function that would do the imputation and not take into account the -8 values. The file I wish to analyze is pretty big (50 x 15,000) so I can't just go in and edit the file.

I also looked at the proc Surveymeans, Surveylogic, Surveyreg, and Surveyfreq and I don't seen any function or parameter that would deal with valid skip variables. It's hard to believe this hasn't already been done. Do I have to read in each variable, strip out the -8, impute - find the means and other descriptive info and then put the -8's back in?? How have others dealt with this situation. Any help would be so much appreciated.

Thanks for any thoughts, code, or advice! -A
1 REPLY 1
rsanchez87
Obsidian | Level 7

Hi there, 

 

I had this same issue with a public data set. The instructions to the data set labeled the values of -996 as a V, for a valid skip. When you try to filter for -996 or V, it doesn't work. 

 

You have to harness the VVALUEX formatting of the column. Below is my solution. I am changing the format for 'valid skips', missing values, 'errors', 'refused', and 'don't know'. 

 

1. I first create a table that has all of the column variable names, their order and their data type. 

2. I determine the max size of the table.

3. I throw that table reference table into a macro and sequence them to the maximum record value. 

4. The macro cycles through each column name and data type associated with the reference table Ith value

5. Logic is applied. 

6. Lastly, a proc freq of the new table outputs the distribution of new codes. I recommend taking a before and after snapchat to validate the code is correct. 

 

*** OUTPUT ALL TABLE CONTENTS TO A NEW TABLE ***; 
PROC CONTENTS DATA= OUTFILE.AGES5_17 OUT= COLUMN_NAMES NOPRINT;
RUN;

*** FORMAT THE COLUMN NAMES TO BE INCLUDED IN THE MACRO FOR VARIABLE AND DATA TYPE COLUMNS ***; 
PROC SQL; 
CREATE TABLE COLUMN_NAMES2 AS
SELECT VARNUM AS POSITION, NAME
		 ,CATS("'",NAME,"'") AS VARIABLE
		 ,TYPE
		 ,CASE WHEN TYPE EQ 1 THEN "'NUMERIC'" ELSE "'CHARACTER'" END AS VARIABLE_TYPE FORMAT = $15.
FROM COLUMN_NAMES
ORDER BY VARNUM
;QUIT;

*** DETERMINE THE MAXIMUM NUMBER OF RECORDS IN THE TABLE ***; 
PROC SQL NOPRINT; 
SELECT MAX INTO: MAX
FROM (SELECT MAX(POSITION) AS MAX FROM COLUMN_NAMES2)
;QUIT;

%PUT MAX NUMBER OF COLUMNS === &MAX.; 

*** TEST FILE ***; 
DATA TEST ; 
	SET OUTFILE.AGES5_17 (OBS = 1000); 
RUN;

*** THESE OPTIONS MUTES THE SAS LOG ***; 
OPTIONS NOPRINTMSGLIST NOMPRINT NONOTES NOSOURCE NOSOURCE2 ERROR = 0; 

*** THIS MACRO ASSIGNS THE TARGET FILE, ITERATES THROUGH EACH REFERENCE VARIABLE AND DATA TYPE UNTIL THE MAX RECORD IS REACH AND APPLIES THE LOGIC TO THE VVALUEX FORMAT OF THE COLUMN ***; 
%MACRO VARS(FILE); 
	%DO I = 1 %TO &MAX.;
		PROC SQL NOPRINT; 
		SELECT NAME INTO: VARLIST_&I.
		FROM COLUMN_NAMES2
		WHERE POSITION EQ &I.
	    ;QUIT;
	    
	    PROC SQL NOPRINT; 
		SELECT VARIABLE_TYPE INTO: TYPELIST_&I.
		FROM COLUMN_NAMES2
		WHERE POSITION EQ &I.
	    ;QUIT;
		
		%PUT VARIABLE NAME === &&&VARLIST_&I. ||| VARIABLE TYPE === &&&TYPELIST_&I.; 

		DATA TEST /*(KEEP = &&&VARLIST_&I.)*/; 
		SET TEST;

		IF COMPRESS(VVALUE(&&&VARLIST_&I.)) = 'V' THEN 
			DO; 
				&&&VARLIST_&I. = 996; 
				OUTPUT; 
			END; 
		ELSE IF COMPRESS(VVALUE(&&&VARLIST_&I.)) = 'R' OR COMPRESS(VVALUE(&&&VARLIST_&I.)) = 'D' 
			OR COMPRESS(VVALUE(&&&VARLIST_&I.)) = .		
			THEN 
				DO; 
					&&&VARLIST_&I. = 999; 
					OUTPUT; 
				END; 
		ELSE IF COMPRESS(VVALUE(&&&VARLIST_&I.)) = 'E' THEN 
			DO; 
				&&&VARLIST_&I. = 994; 
				OUTPUT; 
			END; 
		ELSE 
			DO; 
				&&&VARLIST_&I. = &&&VARLIST_&I.; 
				OUTPUT; 
			END;
	%END; 
%MEND VARS; 
%VARS(FILE = OUTFILE.AGES5_17); 

PROC FREQ DATA = TEST; 
	TABLES MARITAL BARCODE_FAHITEMS / MISSING; 
RUN;

 

HOPE THIS HELPS!!!

 

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 1 reply
  • 1779 views
  • 0 likes
  • 2 in conversation