Hi all,
It has been a little while. I come back to you with what I think should be a pretty simple request.
I have a dataset which i trimmed down to 5 variables. It looks like this:
data WORK.NEO;
infile datalines dsd truncover;
input PUBID:BEST. R1HDRK_Dummy:BEST. R2HDRK_Dummy:BEST. R3HDRK_Dummy:BEST. R4HDRK_Dummy:BEST. R5HDRK_Dummy:BEST.;
format PUBID BEST. R1HDRK_Dummy BEST. R2HDRK_Dummy BEST. R3HDRK_Dummy BEST. R4HDRK_Dummy BEST. R5HDRK_Dummy BEST.;
label PUBID="PUBID" R1HDRK_Dummy="R1HDRK_Dummy" R2HDRK_Dummy="R2HDRK_Dummy" R3HDRK_Dummy="R3HDRK_Dummy" R4HDRK_Dummy="R4HDRK_Dummy" R5HDRK_Dummy="R5HDRK_Dummy";
datalines;
2 0 0 3 0 0
7 0 0 0 0 5
10 0 0 0 0 0
21 1 0 0 0 0
22 1 0 0 0 0
23 0 0 . . 5
29 0 0 0 0 .
31 0 0 0 0 0
35 0 0 3 0 .
41 0 0 0 0 0
;;;;
I am trying to work on a survival model. As such, I ran a missing data report to find that out of the 1858 observations, 1494 have complete data. However, I would like to include some of the incomplete data as these variables are simple, dichotomous dummy-coded variables. Each variable is a different year (1-5). I would ideally like to keep any data that has an occurrence and then is missing. For instance, I would like to keep line 9 (PID 35) because they reported binge drinking in their 3rd year, but then had missing data. Eventually, I will replace the missing values after the first occurrence with 0's but right now I am focused on the deletion.
I do not want any occurrence that I cannot tell if it was their first year binge drinking. For example, line 6 (PID 23) reported a case of binge drinking in year 5, but I do not know if they also did in year 3 or 4.
I wanted to see if anyone had any opinion on if this is best practice with deleting missing values for a survival model as well as some suggestions on code.
I tried using the following code, but the number remaining, 1576 seems rather low. I also know that this code is not streamlined, but it was the best I could do.
data Neo_Clean;
set neo;
if missing(R1HDRK_Dummy) and missing(R2HDRK_Dummy) and missing(R3HDRK_Dummy) and missing(R4HDRK_Dummy) and missing(R5HDRK_Dummy) then delete;
if missing(R1HDRK_Dummy) then delete;
if R5HDRK_Dummy > 0 and missing(R1HDRK_Dummy) or missing(R2HDRK_Dummy) or missing(R3HDRK_Dummy) or missing(R4HDRK_Dummy) then delete;
if R4HDRK_Dummy > 0 and missing(R1HDRK_Dummy) or missing(R2HDRK_Dummy) or missing(R3HDRK_Dummy) then delete;
if R3HDRK_Dummy > 0 and missing(R1HDRK_Dummy) or missing(R2HDRK_Dummy) then delete;
if R2HDRK_Dummy > 0 and missing(R1HDRK_Dummy) then delete;
run;
Any help would be appreciated.
Hi @joebacon
Not sure whether my understanding is accurate. But is this what you are after by any chance?
data WORK.NEO;
infile datalines truncover;
input PUBID:BEST. R1HDRK_Dummy:BEST. R2HDRK_Dummy:BEST. R3HDRK_Dummy:BEST. R4HDRK_Dummy:BEST. R5HDRK_Dummy:BEST.;
format PUBID BEST. R1HDRK_Dummy BEST. R2HDRK_Dummy BEST. R3HDRK_Dummy BEST. R4HDRK_Dummy BEST. R5HDRK_Dummy BEST.;
label PUBID="PUBID" R1HDRK_Dummy="R1HDRK_Dummy" R2HDRK_Dummy="R2HDRK_Dummy" R3HDRK_Dummy="R3HDRK_Dummy" R4HDRK_Dummy="R4HDRK_Dummy" R5HDRK_Dummy="R5HDRK_Dummy";
datalines;
2 0 0 3 0 0
7 0 0 0 0 5
10 0 0 0 0 0
21 1 0 0 0 0
22 1 0 0 0 0
23 0 0 . . 5
29 0 0 0 0 .
31 0 0 0 0 0
35 0 0 3 0 .
41 0 0 0 0 0
;;;;
data want;
set neo;
k=compress(cats(of R1HDRK_Dummy--R5HDRK_Dummy),'0');
if anydigit(k)=1 and index(k,'.') ;
drop k;
run;
Hi @joebacon
Not sure whether my understanding is accurate. But is this what you are after by any chance?
data WORK.NEO;
infile datalines truncover;
input PUBID:BEST. R1HDRK_Dummy:BEST. R2HDRK_Dummy:BEST. R3HDRK_Dummy:BEST. R4HDRK_Dummy:BEST. R5HDRK_Dummy:BEST.;
format PUBID BEST. R1HDRK_Dummy BEST. R2HDRK_Dummy BEST. R3HDRK_Dummy BEST. R4HDRK_Dummy BEST. R5HDRK_Dummy BEST.;
label PUBID="PUBID" R1HDRK_Dummy="R1HDRK_Dummy" R2HDRK_Dummy="R2HDRK_Dummy" R3HDRK_Dummy="R3HDRK_Dummy" R4HDRK_Dummy="R4HDRK_Dummy" R5HDRK_Dummy="R5HDRK_Dummy";
datalines;
2 0 0 3 0 0
7 0 0 0 0 5
10 0 0 0 0 0
21 1 0 0 0 0
22 1 0 0 0 0
23 0 0 . . 5
29 0 0 0 0 .
31 0 0 0 0 0
35 0 0 3 0 .
41 0 0 0 0 0
;;;;
data want;
set neo;
k=compress(cats(of R1HDRK_Dummy--R5HDRK_Dummy),'0');
if anydigit(k)=1 and index(k,'.') ;
drop k;
run;
" I do want to keep these individuals in addition to the other 1494 with complete data."
I am not understanding. Do you mean you just want to Flag?
Ok cool
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.