Help using Base SAS procedures

I want to drop observations but only if both variables are missing

Accepted Solution Solved
Reply
Contributor
Posts: 51
Accepted Solution

I want to drop observations but only if both variables are missing

Hi,

 

I merged two databases based on date of death to give myself a more complete mortality database (I was worried that some deaths were only recorded in one database but not the other). Now I want to subset this database so that if BOTH variables are missing for the same observation, that observation will be dropped. This is the same as saying that I don't have a recorded data of death in either mortality database, so I will assume that the person has not yet died and does not belong in my mortality database (should be dropped).

 

I tried the following code, but I am not sure it translates to "drop the observation if both death_dt and dod are missing, otherwise keep":

 

DATA CLEAN.DEATH_278SUBJECTS; *n=674*;

SET CLEAN.DEATH_BOTH;

WHERE DEATH_DT ne . or dod ne .;

run;

 

I also did some exploring to see if I could figure out how the coding was working in terms of reducing the sample size:

 

DATA CLEAN.DEATH_278SUBJECTS; *n=674*;

SET CLEAN.DEATH_BOTH;

WHERE DEATH_DT ne .;

run;

DATA CLEAN.DEATH_278SUBJECTS; *n=587*;

SET CLEAN.DEATH_BOTH;

WHERE dod ne .;

run;

 

Please help! Thank you.


Accepted Solutions
Solution
‎11-02-2016 03:03 PM
Contributor
Posts: 51

Re: I want to drop observations but only if both variables are missing

I think the "where death_dt ne . AND dod ne .;" statement ends up excluding all observations with EITHER a missing death_dt or missing dod, which is not what I am looking to do. But this made me realize I was overthinking it. There may be a shorter way, but this code seems to work well by creating a new variable to work with.

 

DATA CLEAN.DEATH_BOTH;

SET CLEAN.DEATH_BOTH;

if death_dt = . and dod = . then death_missing = 1;

else death_missing = 0;

run;

 

DATA CLEAN.DEATH_NOTMISSING;

SET CLEAN.DEATH_BOTH;

where death_missing = 0 ;

run;

View solution in original post


All Replies
Super Contributor
Posts: 279

Re: I want to drop observations but only if both variables are missing

change WHERE DEATH_DT ne . or dod ne .;

to WHERE DEATH_DT ne .  AND dod ne .;

Solution
‎11-02-2016 03:03 PM
Contributor
Posts: 51

Re: I want to drop observations but only if both variables are missing

I think the "where death_dt ne . AND dod ne .;" statement ends up excluding all observations with EITHER a missing death_dt or missing dod, which is not what I am looking to do. But this made me realize I was overthinking it. There may be a shorter way, but this code seems to work well by creating a new variable to work with.

 

DATA CLEAN.DEATH_BOTH;

SET CLEAN.DEATH_BOTH;

if death_dt = . and dod = . then death_missing = 1;

else death_missing = 0;

run;

 

DATA CLEAN.DEATH_NOTMISSING;

SET CLEAN.DEATH_BOTH;

where death_missing = 0 ;

run;

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 2 replies
  • 245 views
  • 0 likes
  • 2 in conversation