Creating a Macro for identifying missing values conditioned on other variables

Reply
Occasional Contributor
Posts: 13

Creating a Macro for identifying missing values conditioned on other variables

[ Edited ]

Hi SAS Community,

 

I am cleaning a longitudinal study database with a large number of variables and subjects. I want to create a code that can identify missing values and flag them as data queries to send to the data capturers. But I do not want to pull all missing values for each variable, as some questions should be blank based on the answer to the previous question. The response to the previous question that indicates that a value should not be missing changes from variable to variable. I've been writing individualized do loops to flag these values but it is too long and tedious. I'm only mildly comfortable with macro coding, but I feel like this is a situation that writing a macro program would be helpful. Do you know how to create a macro that will allow me to do this? I've attached an example section of the code I've been using to create a data set to send for queries below.

 

studyid=subject id number

visitdate = visit date subject came in to fill out form

visit = visit type

bonevisit = another visit type

form = form with the missing data

sq = section and question missing data on the form

db = value on database

tl = value on tracking log

dataset = dataset the query was generated from

issue = problem with this variable

 

 

data checkvar14;
keep studyid visitdate visit bonevisit form sq variable issue db tl dataset;
retain studyid visitdate visit bonevisit form sq variable issue db tl dataset;
format visitdate date9.;
set crf14x4;
format tl $10. variable $20. db $25. issue $50.;

dataset="checkvar14";
visitdate=visitdate14;
visit=newvisit;
bonevisit=newbonevisit;
form="CRF14";



if missing(childcare) then do;
sq="I,6";
variable="childcare";
issue="Missing value for childcare";
db="childcare.";
tl="";
output;
end;

if childcare in (1,2) and missing(lengthchildcare) then do;
sq="I,7";
variable="childcare, lengthchildcare";
issue="Missing value for lengthchildcare";
sp=", ";
db=catx(sp, childcare, lengthchildcare);
tl="";
drop sp;
output;
end;

 

 

 

Super User
Posts: 12,148

Re: Creating a Macro for identifying missing values conditioned on other variables

[ Edited ]

I have written code to check skip patterns is surveys, which is similar to what you are doing.

 

"The response to the previous question that indicates that a value should not be missing changes from variable to variable" is the core of the headache with this type of coding.

You can write something that uses macro parameters to identify 1) the independent variable (controls whether to report), dependent variable, values of the independent variable that require the dependent variable to exist, and possibly "issue" text.

But I doubt if you could get anything generic enough to do your second example.

 

If I had many skips with simple dependencies I might be tempted to do something like:

%macro simpledep (indepvar=q1, depvar=);
if &indepvar in (indepvals) and missing &depvar then do;
   variable="&depvar";
   issue = "Missing value for &depvar";
   output;
end;
%mend;

/* in a data step, do NOT put the macro definition inside a data step, won't work*/

%simpledep(indep= Q1, depvar=Q2, indepvals= 1 2);
%simpledep(indep= Q1, depvar=Q3, indepvals= 1 2);
%simpledep(indep= Q1, depvar=Q4, indepvals= 1 2);
%simpledep(indep= Q1, depvar=Q5, indepvals= 1 2);
%simpledep(indep= Q1, depvar=Q6, indepvals= 1 2);

which would test that Q2 through Q6 have values when Q1 is either 1 or 2.

You didn't indicate the purpose of SQ or TL. they could be additional macro parameters but any parameter with commas as part of the value need to use something like SQ =%str(I,6).

 

If I had several repeated patterns that are relatively simple I might put the information above, which could then be a bit messier as Macros can have issues passing certain characters, into variables into a data set and use a data step to write the do loops to a text file.

You could then review the text file and %include into a data step when ready.

Which sort of thing I have done when I had questionnaires with 25 or so blocks of skips.

 

You would likely still get to manually write all the ones the look like: If q1 in (4 5) and Q6 < 10 and not(missing(Q25)) ..

Highlighted
Occasional Contributor
Posts: 13

Re: Creating a Macro for identifying missing values conditioned on other variables

[ Edited ]

Hi,

 

Thank you for this. I've tried it out on a more simple dependency structure where it is all the same. "sq" stands for the section and question number on the form where the variable in question is located. "tl" is unimportant in this code, just a place holder for other data sets it will be stacked with. I've attempted to modify your code, but I'm having issues with the macros. I made a few different ones (4) for the different dependency structure:

 

%macro checkvar18b_1 (crf18bcom=,sq=);
	%if &crf18bcom=. %then %do;
	sq="&sq";
	variable="&crf18bcom";
   	issue = "Missing value for &crf18bcom";
   	output;
	%end;
%mend;

%macro checkvar18b_2 (crf18bcom=,crf18bres=,sq=);
	%if &crf18bcom=1 and &crf18bres=. %then %do;
	sq="&sq";
	variable="&crf18bres";
   	issue = "Missing value for &crf18bres";
   	output;
	%end;
%mend;

%macro checkvar18b_3 (crf18bcom=,crf18bdate=,sq=);
	%if &crf18bcom=1 and &crf18bdate=. %then %do;
	sq="&sq";
	variable="&crf18bdate";
   	issue = "Missing value for &crf18bdate";
   	output;
	%end;
%mend;

%macro checkvar18b_4 (crf18bcom=,crf18bno=,sq=);
	%if &crf18bcom=2 and &crf18bno="" %then %do;
	sq="&sq";
	variable="&crf18bno";
   	issue = "Missing value for &crf18bno";
   	output;
	%end;
%mend;

data checkvar18b;
	keep studyid visitdate visit bonevisit form sq variable issue db tl dataset;
	retain studyid visitdate visit bonevisit form sq variable issue db tl dataset;
	format visitdate date9.;
	set crf18bx4;
	format tl $10. variable $30. db $40. issue $50.;

	dataset="checkvar18b";
	visitdate=visitdate18b;
	visit=newvisit1;
	bonevisit=newbonevisit;
	form="CRF18b";
	tl="";

	%checkvar18b_1(crf18bcom=surea,sq="1");
	%checkvar18b_1(crf18bcom=screat,sq="2");
	%checkvar18b_1(crf18bcom=salb,sq="3");
	%checkvar18b_1(crf18bcom=spho,sq="4");
	%checkvar18b_1(crf18bcom=upro,sq="5");
	%checkvar18b_1(crf18bcom=umic,sq="6");
	%checkvar18b_1(crf18bcom=ucrea,sq="7");
	%checkvar18b_1(crf18bcom=ualb,sq="8");
	%checkvar18b_1(crf18bcom=uegfr,sq="9");
	%checkvar18b_1(crf18bcom=ub2micro,sq="10");
	%checkvar18b_1(crf18bcom=ub2ratio,sq="11");
	%checkvar18b_1(crf18bcom=completedby18b,sq="");
	%checkvar18b_1(crf18bcom=completeddate18b,sq="");

	%checkvar18b_2(crf18bcom=surea, crf18bres=sureares,sq="1");
	%checkvar18b_2(crf18bcom=screat, crf18bres=screatres,sq="2");
	%checkvar18b_2(crf18bcom=salb, crf18bres=salbres,sq="3");
	%checkvar18b_2(crf18bcom=sphos, crf18bres=sphores,sq="4");
	%checkvar18b_2(crf18bcom=upro, crf18bres=suprores,sq="5");
	%checkvar18b_2(crf18bcom=umic, crf18bres=sumicres,sq="6");
	%checkvar18b_2(crf18bcom=ucre, crf18bres=sucreres,sq="7");
	%checkvar18b_2(crf18bcom=ualb, crf18bres=sualbres,sq="8");
	%checkvar18b_2(crf18bcom=uegfr, crf18bres=suegfrres,sq="9");
	%checkvar18b_2(crf18bcom=ub2micro, crf18bres=ub2microrres,sq="10");
	%checkvar18b_2(crf18bcom=ub2ratio, crf18bres=sub2ratiores,sq="11");

	%checkvar18b_3(crf18bcom=surea, crf18bdate=sureadate,sq="1");
	%checkvar18b_3(crf18bcom=screat, crf18bdate=screatdate,sq="2");
	%checkvar18b_3(crf18bcom=salb, crf18bdate=salbdate,sq="3");
	%checkvar18b_3(crf18bcom=sphos, crf18bdate=sphodate,sq="4");
	%checkvar18b_3(crf18bcom=upro, crf18bdate=suprodate,sq="5");
	%checkvar18b_3(crf18bcom=umic, crf18bdate=sumicdate,sq="6");
	%checkvar18b_3(crf18bcom=ucre, crf18bdate=sucredate,sq="7");
	%checkvar18b_3(crf18bcom=ualb, crf18bdate=sualbdate,sq="8");
	%checkvar18b_3(crf18bcom=uegfr, crf18bdate=suegfrdate,sq="9");
	%checkvar18b_3(crf18bcom=ub2micro, crf18bdate=ub2microdate,sq="10");
	%checkvar18b_3(crf18bcom=ub2ratio, crf18bdate=ub2ratiodate,sq="11");

	
	%checkvar18b_4(crf18bcom=surea, crf18bno=nosurea,sq="1");
	%checkvar18b_4(crf18bcom=screat, crf18bno=noscreat,sq="2");
	%checkvar18b_4(crf18bcom=salb, crf18bno=nosal,sq="3");
	%checkvar18b_4(crf18bcom=sphos, crf18bno=nospho,sq="4");
	%checkvar18b_4(crf18bcom=upro, crf18bno=noupro,sq="5");
	%checkvar18b_4(crf18bcom=umic, crf18bno=noumic,sq="6");
	%checkvar18b_4(crf18bcom=ucre, crf18bno=noucre,sq="7");
	%checkvar18b_4(crf18bcom=ualb, crf18bno=noualb,sq="8");
	%checkvar18b_4(crf18bcom=uegfr, crf18bno=uegfrnores,sq="9");
	%checkvar18b_4(crf18bcom=ub2micro, crf18bno=ub2micronores,sq="10");
	%checkvar18b_4(crf18bcom=ub2ratio, crf18bno=ub2rationores,sq="11");

	if missing(serum) and not missing(notserum) then do;
		sq="";
		variable="serum";
		issue="Missing value for serum";
		db="notserum=.,serum=.";
		output;
		end;

	if missing(urine) and not missing(noturine) then do;
		sq="";
		variable="urine";
		issue="Missing value for urine";
		db="noturine=.,urine=.";
		output;
		end;
run;

 

 

After running, I don't get any errors, but nothing was output to my dataset (contains 0 obs). I know there is missing data, but unsure as to why this macro isn't outputting it.

 

 

Any suggestions?

Ask a Question
Discussion stats
  • 2 replies
  • 133 views
  • 0 likes
  • 2 in conversation