Hi all,
It has been a little while. I come back to you with what I think should be a pretty simple request.
I have a dataset which i trimmed down to 5 variables. It looks like this:
data WORK.NEO;
infile datalines dsd truncover;
input PUBID:BEST. R1HDRK_Dummy:BEST. R2HDRK_Dummy:BEST. R3HDRK_Dummy:BEST. R4HDRK_Dummy:BEST. R5HDRK_Dummy:BEST.;
format PUBID BEST. R1HDRK_Dummy BEST. R2HDRK_Dummy BEST. R3HDRK_Dummy BEST. R4HDRK_Dummy BEST. R5HDRK_Dummy BEST.;
label PUBID="PUBID" R1HDRK_Dummy="R1HDRK_Dummy" R2HDRK_Dummy="R2HDRK_Dummy" R3HDRK_Dummy="R3HDRK_Dummy" R4HDRK_Dummy="R4HDRK_Dummy" R5HDRK_Dummy="R5HDRK_Dummy";
datalines;
2 0 0 3 0 0
7 0 0 0 0 5
10 0 0 0 0 0
21 1 0 0 0 0
22 1 0 0 0 0
23 0 0 . . 5
29 0 0 0 0 .
31 0 0 0 0 0
35 0 0 3 0 .
41 0 0 0 0 0
;;;;
I am trying to work on a survival model. As such, I ran a missing data report to find that out of the 1858 observations, 1494 have complete data. However, I would like to include some of the incomplete data as these variables are simple, dichotomous dummy-coded variables. Each variable is a different year (1-5). I would ideally like to keep any data that has an occurrence and then is missing. For instance, I would like to keep line 9 (PID 35) because they reported binge drinking in their 3rd year, but then had missing data. Eventually, I will replace the missing values after the first occurrence with 0's but right now I am focused on the deletion.
I do not want any occurrence that I cannot tell if it was their first year binge drinking. For example, line 6 (PID 23) reported a case of binge drinking in year 5, but I do not know if they also did in year 3 or 4.
I wanted to see if anyone had any opinion on if this is best practice with deleting missing values for a survival model as well as some suggestions on code.
I tried using the following code, but the number remaining, 1576 seems rather low. I also know that this code is not streamlined, but it was the best I could do.
data Neo_Clean;
set neo;
if missing(R1HDRK_Dummy) and missing(R2HDRK_Dummy) and missing(R3HDRK_Dummy) and missing(R4HDRK_Dummy) and missing(R5HDRK_Dummy) then delete;
if missing(R1HDRK_Dummy) then delete;
if R5HDRK_Dummy > 0 and missing(R1HDRK_Dummy) or missing(R2HDRK_Dummy) or missing(R3HDRK_Dummy) or missing(R4HDRK_Dummy) then delete;
if R4HDRK_Dummy > 0 and missing(R1HDRK_Dummy) or missing(R2HDRK_Dummy) or missing(R3HDRK_Dummy) then delete;
if R3HDRK_Dummy > 0 and missing(R1HDRK_Dummy) or missing(R2HDRK_Dummy) then delete;
if R2HDRK_Dummy > 0 and missing(R1HDRK_Dummy) then delete;
run;
Any help would be appreciated.
Hi @joebacon
Not sure whether my understanding is accurate. But is this what you are after by any chance?
data WORK.NEO;
infile datalines truncover;
input PUBID:BEST. R1HDRK_Dummy:BEST. R2HDRK_Dummy:BEST. R3HDRK_Dummy:BEST. R4HDRK_Dummy:BEST. R5HDRK_Dummy:BEST.;
format PUBID BEST. R1HDRK_Dummy BEST. R2HDRK_Dummy BEST. R3HDRK_Dummy BEST. R4HDRK_Dummy BEST. R5HDRK_Dummy BEST.;
label PUBID="PUBID" R1HDRK_Dummy="R1HDRK_Dummy" R2HDRK_Dummy="R2HDRK_Dummy" R3HDRK_Dummy="R3HDRK_Dummy" R4HDRK_Dummy="R4HDRK_Dummy" R5HDRK_Dummy="R5HDRK_Dummy";
datalines;
2 0 0 3 0 0
7 0 0 0 0 5
10 0 0 0 0 0
21 1 0 0 0 0
22 1 0 0 0 0
23 0 0 . . 5
29 0 0 0 0 .
31 0 0 0 0 0
35 0 0 3 0 .
41 0 0 0 0 0
;;;;
data want;
set neo;
k=compress(cats(of R1HDRK_Dummy--R5HDRK_Dummy),'0');
if anydigit(k)=1 and index(k,'.') ;
drop k;
run;
Hi @joebacon
Not sure whether my understanding is accurate. But is this what you are after by any chance?
data WORK.NEO;
infile datalines truncover;
input PUBID:BEST. R1HDRK_Dummy:BEST. R2HDRK_Dummy:BEST. R3HDRK_Dummy:BEST. R4HDRK_Dummy:BEST. R5HDRK_Dummy:BEST.;
format PUBID BEST. R1HDRK_Dummy BEST. R2HDRK_Dummy BEST. R3HDRK_Dummy BEST. R4HDRK_Dummy BEST. R5HDRK_Dummy BEST.;
label PUBID="PUBID" R1HDRK_Dummy="R1HDRK_Dummy" R2HDRK_Dummy="R2HDRK_Dummy" R3HDRK_Dummy="R3HDRK_Dummy" R4HDRK_Dummy="R4HDRK_Dummy" R5HDRK_Dummy="R5HDRK_Dummy";
datalines;
2 0 0 3 0 0
7 0 0 0 0 5
10 0 0 0 0 0
21 1 0 0 0 0
22 1 0 0 0 0
23 0 0 . . 5
29 0 0 0 0 .
31 0 0 0 0 0
35 0 0 3 0 .
41 0 0 0 0 0
;;;;
data want;
set neo;
k=compress(cats(of R1HDRK_Dummy--R5HDRK_Dummy),'0');
if anydigit(k)=1 and index(k,'.') ;
drop k;
run;
" I do want to keep these individuals in addition to the other 1494 with complete data."
I am not understanding. Do you mean you just want to Flag?
Ok cool
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.