Dear All, I have one very simple question:
Q1: How could below dataset be input successfully?
Here is the data I have, which is created by below code:
data have;
input subjid avisit dat $20.;
datalines;
1001 0 1997-01-01
1001 1 1997-02-01
1001 2
1001 3 1997-05-01
1002 0 1998-01-01
1002 1 1998-02-01
1002 2
1002 3
1002 4
1002 5
1002 6 1998-12-01
1003 0 1999-01-01
1003 1 1999-02-01
1003 2
1003 3 1999-03-08
1003 4
1003 5 1999-04-05
1003 6 1999-05-01
1003 7
1003 8
;
run;
data have_;
set have;
adt=input(dat, yymmdd10.);
attrib adt format=yymmdd10.;
run;
Q2: If you look at the last variable adt generated by below dataset WANT (which, unfortunately I can't produce by below code successfully, please assist me as well), you will see actually, I want to derive the last non-missing dat which should prior the consecutive dat, but I have no idea how to derive the last column, base on the first 3 variable correctly, hope you can help me.
data want;
infile datalines delimiter=",";
input subjid avisit dat $20. adt $20.;
datalines;
1001,0,1997-01-01,1997-05-01
1001,1,1997-02-01,1997-05-01
1001,2, ,1997-05-01
1001,3,1997-05-01,1997-05-01
1002,0,1998-01-01,1998-02-01
1002,1,1998-02-01,1998-02-01
1002,2, ,1998-02-01
1002,3, ,1998-02-01
1002,4, ,1998-02-01
1002,5, ,1998-02-01
1002,6,1998-12-01,1998-02-01
1003,0,1999-01-01,1999-05-01
1003,1,1999-02-01,1999-05-01
1003,2, ,1999-05-01
1003,3,1999-03-08,1999-05-01
1003,4, ,1999-05-01
1003,5,1999-04-05,1999-05-01
1003,6,1999-05-01,1999-05-01
1003,7, ,1999-05-01
1003,8, ,1999-05-01
;
run;
data have; infile datalines truncover; input subjid avisit dat : yymmdd10.; format dat yymmdd10.; datalines; 1001 0 1997-01-01 1001 1 1997-02-01 1001 2 1001 3 1997-05-01 1002 0 1998-01-01 1002 1 1998-02-01 1002 2 1002 3 1002 4 1002 5 1002 6 1998-12-01 1003 0 1999-01-01 1003 1 1999-02-01 1003 2 1003 3 1999-03-08 1003 4 1003 5 1003 6 1999-05-01 1003 7 1003 8 ; data have1; set have; by subjid; retain found 0; if first.subjid then do;n=0;found=0;end; if missing(dat) then n+1; else n=0; if n=2 then found=1; if not found; drop n found; run; data temp; set have1(where=(dat is not missing)); by subjid; if last.subjid; run; data want; merge have temp(rename=(dat=new_dat) drop=avisit); by subjid; run; data want2; set want; if missing(dat) then call missing(new_dat); run;
data have; infile datalines truncover; input subjid avisit dat : yymmdd10.; format dat yymmdd10.; datalines; 1001 0 1997-01-01 1001 1 1997-02-01 1001 2 1001 3 1997-05-01 1002 0 1998-01-01 1002 1 1998-02-01 1002 2 1002 3 1002 4 1002 5 1002 6 1998-12-01 1003 0 1999-01-01 1003 1 1999-02-01 1003 2 1003 3 1999-03-08 1003 4 1003 5 1999-04-05 1003 6 1999-05-01 1003 7 1003 8 ; data temp; set have(where=(dat is not missing)); by subjid; if last.subjid; run; data want; merge have temp(rename=(dat=new_dat) drop=avisit); by subjid; run; data want2; set want; if missing(dat) then call missing(new_dat); run;
Thank you, actually, in your algorithm to derive the last variable, there is bit problem in below case, I just modify the 1003 subject, and the last variable value turned to the one I did not want.
data have;
infile datalines truncover;
input subjid avisit dat : yymmdd10.;
format dat yymmdd10.;
datalines;
1001 0 1997-01-01
1001 1 1997-02-01
1001 2
1001 3 1997-05-01
1002 0 1998-01-01
1002 1 1998-02-01
1002 2
1002 3
1002 4
1002 5
1002 6 1998-12-01
1003 0 1999-01-01
1003 1 1999-02-01
1003 2
1003 3 1999-03-08
1003 4
1003 5
1003 6 1999-05-01
1003 7
1003 8
;
data temp;
set have(where=(dat is not missing));
by subjid;
if last.subjid;
run;
data want;
merge have temp(rename=(dat=new_dat) drop=avisit);
by subjid;
run;
data want2;
set want;
if missing(dat) then call missing(new_dat);
run;
The right value I am looking for is 1999-03-08 for 1003, how should I change the code? Thanks
"The right value I am looking for is 1999-03-08 for 1003"
What's the rule?
Hi, The rule is that if there is consecutive missing (missing consecutively for two more more times for each subjid) then the last non-missing DAT will be chosen as the value of ADT.
This implements the rule: take the first dat value that follows a missing dat value :
data want;
do until(last.subjid);
set have; by subjid;
if missing(dat) then gap = 1;
else if gap and missing(val) then val = dat;
end;
do until(last.subjid);
set have; by subjid;
if not missing(dat)
then adt = val;
else call missing(adt);
output;
end;
drop gap val;
format adt yymmdd10.;
run;
Higly appreciated for your help. However, I found maybe you missed the 'consecutive' missing, for 1002, the adt should be 1998-02-01 (of avisit=1) as for the avisit=2, 3, 4, and 5 DAT are missing consecutively for this subject. Now 1003 the value is what I want to have.
Could you furtherly help me. Million thanks.
Ok then. The rule seems to be: take the first dat value that immediately precedes a gap of size 2 or more; assume there is such a gap at the end of the series. Try this:
data want;
do until(last.subjid);
set have; by subjid;
if gap < 2 then
if missing(dat) then gap = sum(1, gap);
else val = dat;
end;
do until(last.subjid);
set have; by subjid;
if not missing(dat)
then adt = val;
else call missing(adt);
output;
end;
drop gap val;
format adt yymmdd10.;
run;
data have; infile datalines truncover; input subjid avisit dat : yymmdd10.; format dat yymmdd10.; datalines; 1001 0 1997-01-01 1001 1 1997-02-01 1001 2 1001 3 1997-05-01 1002 0 1998-01-01 1002 1 1998-02-01 1002 2 1002 3 1002 4 1002 5 1002 6 1998-12-01 1003 0 1999-01-01 1003 1 1999-02-01 1003 2 1003 3 1999-03-08 1003 4 1003 5 1003 6 1999-05-01 1003 7 1003 8 ; data have1; set have; by subjid; retain found 0; if first.subjid then do;n=0;found=0;end; if missing(dat) then n+1; else n=0; if n=2 then found=1; if not found; drop n found; run; data temp; set have1(where=(dat is not missing)); by subjid; if last.subjid; run; data want; merge have temp(rename=(dat=new_dat) drop=avisit); by subjid; run; data want2; set want; if missing(dat) then call missing(new_dat); run;
Thank you very much, Sharp. Resolved perfectly.
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.