Dear All, I have one very simple question:
Q1: How could below dataset be input successfully?
Here is the data I have, which is created by below code:
data have;
input subjid avisit dat $20.;
datalines;
1001 0 1997-01-01
1001 1 1997-02-01
1001 2
1001 3 1997-05-01
1002 0 1998-01-01
1002 1 1998-02-01
1002 2
1002 3
1002 4
1002 5
1002 6 1998-12-01
1003 0 1999-01-01
1003 1 1999-02-01
1003 2
1003 3 1999-03-08
1003 4
1003 5 1999-04-05
1003 6 1999-05-01
1003 7
1003 8
;
run;
data have_;
set have;
adt=input(dat, yymmdd10.);
attrib adt format=yymmdd10.;
run;
Q2: If you look at the last variable adt generated by below dataset WANT (which, unfortunately I can't produce by below code successfully, please assist me as well), you will see actually, I want to derive the last non-missing dat which should prior the consecutive dat, but I have no idea how to derive the last column, base on the first 3 variable correctly, hope you can help me.
data want;
infile datalines delimiter=",";
input subjid avisit dat $20. adt $20.;
datalines;
1001,0,1997-01-01,1997-05-01
1001,1,1997-02-01,1997-05-01
1001,2, ,1997-05-01
1001,3,1997-05-01,1997-05-01
1002,0,1998-01-01,1998-02-01
1002,1,1998-02-01,1998-02-01
1002,2, ,1998-02-01
1002,3, ,1998-02-01
1002,4, ,1998-02-01
1002,5, ,1998-02-01
1002,6,1998-12-01,1998-02-01
1003,0,1999-01-01,1999-05-01
1003,1,1999-02-01,1999-05-01
1003,2, ,1999-05-01
1003,3,1999-03-08,1999-05-01
1003,4, ,1999-05-01
1003,5,1999-04-05,1999-05-01
1003,6,1999-05-01,1999-05-01
1003,7, ,1999-05-01
1003,8, ,1999-05-01
;
run;
data have; infile datalines truncover; input subjid avisit dat : yymmdd10.; format dat yymmdd10.; datalines; 1001 0 1997-01-01 1001 1 1997-02-01 1001 2 1001 3 1997-05-01 1002 0 1998-01-01 1002 1 1998-02-01 1002 2 1002 3 1002 4 1002 5 1002 6 1998-12-01 1003 0 1999-01-01 1003 1 1999-02-01 1003 2 1003 3 1999-03-08 1003 4 1003 5 1003 6 1999-05-01 1003 7 1003 8 ; data have1; set have; by subjid; retain found 0; if first.subjid then do;n=0;found=0;end; if missing(dat) then n+1; else n=0; if n=2 then found=1; if not found; drop n found; run; data temp; set have1(where=(dat is not missing)); by subjid; if last.subjid; run; data want; merge have temp(rename=(dat=new_dat) drop=avisit); by subjid; run; data want2; set want; if missing(dat) then call missing(new_dat); run;
data have; infile datalines truncover; input subjid avisit dat : yymmdd10.; format dat yymmdd10.; datalines; 1001 0 1997-01-01 1001 1 1997-02-01 1001 2 1001 3 1997-05-01 1002 0 1998-01-01 1002 1 1998-02-01 1002 2 1002 3 1002 4 1002 5 1002 6 1998-12-01 1003 0 1999-01-01 1003 1 1999-02-01 1003 2 1003 3 1999-03-08 1003 4 1003 5 1999-04-05 1003 6 1999-05-01 1003 7 1003 8 ; data temp; set have(where=(dat is not missing)); by subjid; if last.subjid; run; data want; merge have temp(rename=(dat=new_dat) drop=avisit); by subjid; run; data want2; set want; if missing(dat) then call missing(new_dat); run;
Thank you, actually, in your algorithm to derive the last variable, there is bit problem in below case, I just modify the 1003 subject, and the last variable value turned to the one I did not want.
data have;
infile datalines truncover;
input subjid avisit dat : yymmdd10.;
format dat yymmdd10.;
datalines;
1001 0 1997-01-01
1001 1 1997-02-01
1001 2
1001 3 1997-05-01
1002 0 1998-01-01
1002 1 1998-02-01
1002 2
1002 3
1002 4
1002 5
1002 6 1998-12-01
1003 0 1999-01-01
1003 1 1999-02-01
1003 2
1003 3 1999-03-08
1003 4
1003 5
1003 6 1999-05-01
1003 7
1003 8
;
data temp;
set have(where=(dat is not missing));
by subjid;
if last.subjid;
run;
data want;
merge have temp(rename=(dat=new_dat) drop=avisit);
by subjid;
run;
data want2;
set want;
if missing(dat) then call missing(new_dat);
run;
The right value I am looking for is 1999-03-08 for 1003, how should I change the code? Thanks
"The right value I am looking for is 1999-03-08 for 1003"
What's the rule?
Hi, The rule is that if there is consecutive missing (missing consecutively for two more more times for each subjid) then the last non-missing DAT will be chosen as the value of ADT.
This implements the rule: take the first dat value that follows a missing dat value :
data want;
do until(last.subjid);
set have; by subjid;
if missing(dat) then gap = 1;
else if gap and missing(val) then val = dat;
end;
do until(last.subjid);
set have; by subjid;
if not missing(dat)
then adt = val;
else call missing(adt);
output;
end;
drop gap val;
format adt yymmdd10.;
run;
Higly appreciated for your help. However, I found maybe you missed the 'consecutive' missing, for 1002, the adt should be 1998-02-01 (of avisit=1) as for the avisit=2, 3, 4, and 5 DAT are missing consecutively for this subject. Now 1003 the value is what I want to have.
Could you furtherly help me. Million thanks.
Ok then. The rule seems to be: take the first dat value that immediately precedes a gap of size 2 or more; assume there is such a gap at the end of the series. Try this:
data want;
do until(last.subjid);
set have; by subjid;
if gap < 2 then
if missing(dat) then gap = sum(1, gap);
else val = dat;
end;
do until(last.subjid);
set have; by subjid;
if not missing(dat)
then adt = val;
else call missing(adt);
output;
end;
drop gap val;
format adt yymmdd10.;
run;
data have; infile datalines truncover; input subjid avisit dat : yymmdd10.; format dat yymmdd10.; datalines; 1001 0 1997-01-01 1001 1 1997-02-01 1001 2 1001 3 1997-05-01 1002 0 1998-01-01 1002 1 1998-02-01 1002 2 1002 3 1002 4 1002 5 1002 6 1998-12-01 1003 0 1999-01-01 1003 1 1999-02-01 1003 2 1003 3 1999-03-08 1003 4 1003 5 1003 6 1999-05-01 1003 7 1003 8 ; data have1; set have; by subjid; retain found 0; if first.subjid then do;n=0;found=0;end; if missing(dat) then n+1; else n=0; if n=2 then found=1; if not found; drop n found; run; data temp; set have1(where=(dat is not missing)); by subjid; if last.subjid; run; data want; merge have temp(rename=(dat=new_dat) drop=avisit); by subjid; run; data want2; set want; if missing(dat) then call missing(new_dat); run;
Thank you very much, Sharp. Resolved perfectly.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.