BookmarkSubscribeRSS Feed
inquistive
Quartz | Level 8

Hi experts,

Here is my code-

data Review;
		length notes $50;
	set details; 
		if missing (student_id) then notes ="Student ID Missing";
		else if  missing(session_startdt) then notes="Session Start Date Missing";
		else if  missing(session_enddt) then notes ="Session End Date Missing";

		else if  missing(academic_year_startdt) then notes ="Academic  Year Start Date Missing";
		else if missing(academic_year_enddt) then notes="Academic Year  End Date Missing";
		else if missing(academic_year_startdt--academic_year_enddt) then notes= 'Both Dates Missing';

run;

Here is the output-

inquistive_0-1676322096990.png

 

The problem here is the logic is outputting the notes value after evaluating the first missing condition( academic_year_startdt) is found. But it's not  capturing the next missing value( academic_year_enddt).

 

And also, it's not evaluating the condition-else if missing(academic_year_startdt---academic_year_enddt) then notes= 'Both Dates Missing', to output its defined value.

 

Could you tell me what I am doing wrong ?What I am supposed to do?

 

Thanks for your help in advance!

 

8 REPLIES 8
PaigeMiller
Diamond | Level 26
data Review;
		length notes $50;
	set details; 
		if missing (student_id) then notes ="Student ID Missing";
		else if  missing(session_startdt) then notes="Session Start Date Missing";
		else if  missing(session_enddt) then notes ="Session End Date Missing";

		else if missing(academic_year_startdt) and missing(academic_year_enddt) then notes= 'Both Dates Missing';
		else if  missing(academic_year_startdt) then notes ="Academic  Year Start Date Missing";
		else if missing(academic_year_enddt) then notes="Academic Year  End Date Missing";


run;
--
Paige Miller
inquistive
Quartz | Level 8

@PaigeMiller ,

Thanks for the response.

You modified the code and changed the order of the statements and put this first-

 

...else if missing (academic_year_startdt)and missing(academic_year_enddt) then notes= catx( "Both Dates Missing")...

 

and the program evaluated the first true condition and returned results. But if you see the program, the following two conditions are also true: 

...

else if missing(academic_year_startdt) then notes ="Academic Year Start Date Missing";
else if missing(academic_year_enddt) then notes="Academic Year End Date Missing ";

...

Why is the program not evaluating those conditions(which are also true)? I want all the true conditions captured in the derived column as per the end users' preference .We can define the length of the newly derived column(Notes) from $50 to  $500, if needed to make space for the column value.

 

Thanks,

PaigeMiller
Diamond | Level 26

Why is the program not evaluating those conditions(which are also true)?

Because IF-THEN-ELSE logic stops if the first IF statement is true.

 

I want all the true conditions captured in the derived column as per the end users' preference

So when both dates are missing, you want three text strings for that student? What would that look like? Why are three necessary when 'Both Dates Missing' is a complete description of the situation, other text is redundant.

--
Paige Miller
inquistive
Quartz | Level 8

@PaigeMiller ,

Agreed ('IF-THEN-ELSE logic stops if the first IF statement is true') . What about the following condition where the values/strings are not redundant?

inquistive_0-1676927787965.png

data modify;
		length notes $200;
	set review; 
		if missing (student_id) then notes ="Student ID Missing";
		else if  missing(session_startdt) then notes="Session Start Date Missing";
		else if  missing(session_enddt) then notes ="Session End Date Missing";
		else if missing(academic_year_startdt) and missing (academic_year_enddt) then notes= 'Both Dates Missing';
		else if  missing(academic_year_enddt) then notes ="Academic  Year Start Date Missing";
		else if missing(academic_year_enddt) then notes="Academic Year  End Date Missing";
run;

Here is the result.

inquistive_1-1676928015235.png

It's still processing only the first true condition.

 

Thanks, 

PaigeMiller
Diamond | Level 26

Okay, I think it is time for you to be very very specific here. What is the desired value of NOTES in each situation? You can write it out in plain English if necessary.

--
Paige Miller
Quentin
Super User

Untested, because you didn't post test data.  If you want the NOTES variable to have, say a pipe-delimited list of all the missing value notes that apply for a record, you could play with stuff like:

if missing(student_id)      then notes=catx("|",notes,"Student ID Missing") ;
if missing(session_startdt) then notes=catx("|",notes,"Session Start Date Missing") ;

 

Or if you want more "human readable" it could be an AND-delimited list:

if missing(student_id)      then notes=catx(" and ",notes,"Student ID Missing");
if missing(session_startdt) then notes=catx(" and ",notes,"Session Start Date Missing");

If you've got a lot of these, you could use an array, and if the variables have labels you could use the label to generate the text for the note, rather than hard-coding it.

The Boston Area SAS Users Group is hosting free webinars!
Next webinar will be in January 2025. Until then, check out our archives: https://www.basug.org/videos. And be sure to subscribe to our our email list.
ballardw
Super User

Personally if this were my data instead of trying to stuff multiple values into a single variable, considering that you may have cases where Session end date is missing as well as one of your Academic Year variables (which case you have not covered at all), have a bunch of flag variables, one for each variable/condition that you want reported.

Something like this:

data Review;
	set details; 
	n1 = missing (student_id);
	n2=  missing(session_startdt);
	n3= missing(session_enddt); 
	n4=  missing(academic_year_startdt); 
	n5= missing(academic_year_enddt);
   label 
      n1 ="Student ID Missing"
      n2 ="Session Start Date Missing"
      n3 ="Session End Date Missing"
      n4 ="Academic Year Start Date Missing"
      n5 ="Academic Year End Date Missing"
   ;
run;

proc format;
  value yn
  1='Yes'
  0='No'
run;

proc print data=review label;
   var n1-n5;
   format n1-n5 Yn.;
run;

If you sum (of N1 - N5) you get number of problem variables per row. If you Mean(of N1-N5) you get decimal percentage of problems per observation. And a few other things similar are possible.

 

The reason I say "if this were my data" is that I have multiple projects that this a basic part of the data quality assurance reporting as well as a step to report result of interest for multiple variables.

inquistive
Quartz | Level 8

@ballardw,

 

Thanks for the good insight.

 

However, I am looking for the code that outputs value to the new column based on the all/any true conditions. In simple terms, that is what the end users are looking for.

 

Thanks,

 

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 8 replies
  • 1092 views
  • 3 likes
  • 4 in conversation