BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
JC411911
Obsidian | Level 7

Hello, 

I am practicing with these data steps and if/else scenarios. Both data steps run fine but for the missing values being output to work.misschol I get different results. I can't understand why the highlited statement below isn't working when using = . but missing(cholesterol) is working. The cholesterol variable is numeric.

 

data work.lowchol work.highchol work.misschol;
	set sashelp.heart;

	if cholesterol lt 200 then
		output work.lowchol;
	else if cholesterol ge 200 then
		output work.highchol;
	else if cholesterol = . then
		output work.misschol;
run;

data work.lowchol work.highchol work.misschol;
	set sashelp.heart;

	if missing(cholesterol) then
		output work.misschol;
	else if cholesterol lt 200 then
		output work.lowchol;
	else if cholesterol ge 200 then
		output work.highchol;
run;

Thank you for your time and help in advance!

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

In SAS, missing values for numerics are taken to be "less than" all valid values, including all valid negative values.  So missing values meet your first test condition in 

	if cholesterol lt 200 then
		output work.lowchol;
	else if cholesterol ge 200 then
		output work.highchol;
	else if cholesterol = . then
		output work.misschol;


As a result, the 3rd condition ("else if cholesterol=.") is never assessed.

 

If you change the order of conditions, you should get what you expect:

	if cholesterol = . then
		output work.misschol;
    else if cholesterol lt 200 then
		output work.lowchol;
	else if cholesterol ge 200 then
		output work.highchol;

or better yet (to include missing values from .A to .Z  and ._):

	if missing(cholesterol) then
		output work.misschol;
    else if cholesterol lt 200 then
		output work.lowchol;
	else if cholesterol ge 200 then
		output work.highchol;
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

7 REPLIES 7
mkeintz
PROC Star

In SAS, missing values for numerics are taken to be "less than" all valid values, including all valid negative values.  So missing values meet your first test condition in 

	if cholesterol lt 200 then
		output work.lowchol;
	else if cholesterol ge 200 then
		output work.highchol;
	else if cholesterol = . then
		output work.misschol;


As a result, the 3rd condition ("else if cholesterol=.") is never assessed.

 

If you change the order of conditions, you should get what you expect:

	if cholesterol = . then
		output work.misschol;
    else if cholesterol lt 200 then
		output work.lowchol;
	else if cholesterol ge 200 then
		output work.highchol;

or better yet (to include missing values from .A to .Z  and ._):

	if missing(cholesterol) then
		output work.misschol;
    else if cholesterol lt 200 then
		output work.lowchol;
	else if cholesterol ge 200 then
		output work.highchol;
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
JC411911
Obsidian | Level 7

Hi @mkeintz,

Thank you for your help and explanation. I wasn't aware that the order of my statements would cause this to occur. Glad to have learned that moving the =. to the first if statement would work. I did test and got the correct results. That being said I will continue to use missing() for this scenario to consider other values. Thanks again!

jimbarbour
Meteorite | Level 14

Well, it could be a couple of things.

  1. There are more missing numeric value representations than just "."  For example, ".A" through ".Z" are also valid representations of missing values.  The MISSING() function is generally better than IF var = . because MISSING will identify all valid representations of missing whereas an IF statement will only identify the missing representation that is specified.
  2. Your IF statements are in a different order.  If you want the same results from two different forms of an IF statement, the conditions in the IF statements should be in the same order.
  3. Lastly, the IF statement could be improved by having a default ELSE.  In other words, the final ELSE statement shouldn't have an IF under it.  

Regarding #3, try this:

Change this:

	if cholesterol lt 200 then
		output work.lowchol;
	else if cholesterol ge 200 then
		output work.highchol;
	else if cholesterol = . then
		output work.misschol;

to the below and re-run.  I'd be very interested to know your results.

	if cholesterol = . then
		output work.misschol;
	else if cholesterol lt 200 then
		output work.lowchol;
	else
		output work.highchol;

Hope that helps,

 

Jim

 

JC411911
Obsidian | Level 7

Hello @jimbarbour , Thanks for your time.I did try the below code and it did work perfect. It does seem to simply the code as all left over results are sent to highcol. The misschol also had the correct results, but to your point I think from here on out I will utilize missing() for this. Dumb question but can I use the missing function for character values as well or is this a numeric function only?

if cholesterol = . then
		output work.misschol;
	else if cholesterol lt 200 then
		output work.lowchol;
	else
		output work.highchol;
Tom
Super User Tom
Super User

MISSING() works on character values also.  Strings with only spaces are considered missing.

JC411911
Obsidian | Level 7
Thank you Tom!
jimbarbour
Meteorite | Level 14

@JC411911 wrote:

Hello @jimbarbour , Thanks for your time.I did try the below code and it did work perfect. It does seem to simply the code as all left over results are sent to highcol. The misschol also had the correct results, but to your point I think from here on out I will utilize missing() for this. Dumb question but can I use the missing function for character values as well or is this a numeric function only?

if cholesterol = . then
		output work.misschol;
	else if cholesterol lt 200 then
		output work.lowchol;
	else
		output work.highchol;

As mentioned by @Tom, the MISSING function works on both Character or Numeric data.  It's a great function.  🙂

 

So several I think good points here:

  • Missing values are considered lower than even large negative numbers.  In IF statements, Check for Missing first.
  • MISSING() works for both character or numeric
  • MISSING() for numeric is superior because it will check for all valid representations of Missing, not just a period.
  • The last ELSE in a series of IF THEN ELSE statements should generally not have another condition.  In other words, the last ELSE should be a default which will catch anything not covered in the conditions of the IF's.
  • When comparing the results of two different conditionals, (e.g. IF vs IF, IF vs. WHERE), check the conditions in the same order.  "Apples to apples" makes for a valid comparison.  "Apples to oranges" does not.

Jim

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 7 replies
  • 1180 views
  • 10 likes
  • 4 in conversation