BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Jack2012
Obsidian | Level 7

Dear All, I have one very simple question:

 

Q1: How could below dataset be input successfully? 

Here is the data I have, which is created by below code:


data have;
input subjid avisit dat $20.;
datalines;
1001 0 1997-01-01
1001 1 1997-02-01
1001 2
1001 3 1997-05-01
1002 0 1998-01-01
1002 1 1998-02-01
1002 2
1002 3
1002 4
1002 5
1002 6 1998-12-01
1003 0 1999-01-01
1003 1 1999-02-01
1003 2
1003 3 1999-03-08
1003 4
1003 5 1999-04-05
1003 6 1999-05-01
1003 7
1003 8
;
run;

data have_;
set have;
adt=input(dat, yymmdd10.);
attrib adt format=yymmdd10.;

run;

Jack2012_0-1599652991246.png

 

 

Q2: If you look at the last variable adt generated by below dataset WANT (which, unfortunately I can't produce by below code successfully, please assist me as well), you will see actually, I want to derive the last non-missing dat which should prior the consecutive dat, but I have no idea how to derive the last column, base on the first 3 variable correctly, hope you can help me. 


data want;
infile datalines delimiter=",";
input subjid avisit dat $20. adt $20.;
datalines;
1001,0,1997-01-01,1997-05-01
1001,1,1997-02-01,1997-05-01
1001,2, ,1997-05-01
1001,3,1997-05-01,1997-05-01
1002,0,1998-01-01,1998-02-01
1002,1,1998-02-01,1998-02-01
1002,2, ,1998-02-01
1002,3, ,1998-02-01
1002,4, ,1998-02-01
1002,5, ,1998-02-01
1002,6,1998-12-01,1998-02-01
1003,0,1999-01-01,1999-05-01
1003,1,1999-02-01,1999-05-01
1003,2, ,1999-05-01
1003,3,1999-03-08,1999-05-01
1003,4, ,1999-05-01
1003,5,1999-04-05,1999-05-01
1003,6,1999-05-01,1999-05-01
1003,7, ,1999-05-01
1003,8, ,1999-05-01
;
run;

 

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User
data have;
infile datalines truncover;
input subjid avisit dat : yymmdd10.;
format dat yymmdd10.;
datalines;
1001 0 1997-01-01
1001 1 1997-02-01
1001 2
1001 3 1997-05-01
1002 0 1998-01-01
1002 1 1998-02-01
1002 2
1002 3
1002 4
1002 5
1002 6 1998-12-01
1003 0 1999-01-01
1003 1 1999-02-01
1003 2
1003 3 1999-03-08
1003 4
1003 5
1003 6 1999-05-01
1003 7
1003 8
;

data have1;
 set have;
 by subjid;
 retain found 0;
 if first.subjid then do;n=0;found=0;end;
 if missing(dat) then n+1;
  else n=0;
 if n=2 then found=1;
 if not found;
 drop n found;
run;

data temp;
set have1(where=(dat is not missing));
by subjid;
if last.subjid;
run;
data want;
merge have temp(rename=(dat=new_dat) drop=avisit);
by subjid;
run;
data want2;
set want;
if missing(dat) then call missing(new_dat);
run;

View solution in original post

10 REPLIES 10
Ksharp
Super User
data have;
infile datalines truncover;
input subjid avisit dat : yymmdd10.;
format dat yymmdd10.;
datalines;
1001 0 1997-01-01
1001 1 1997-02-01
1001 2
1001 3 1997-05-01
1002 0 1998-01-01
1002 1 1998-02-01
1002 2
1002 3
1002 4
1002 5
1002 6 1998-12-01
1003 0 1999-01-01
1003 1 1999-02-01
1003 2
1003 3 1999-03-08
1003 4
1003 5 1999-04-05
1003 6 1999-05-01
1003 7
1003 8
;
data temp;
 set have(where=(dat is not missing));
 by subjid;
 if last.subjid;
run;
data want;
 merge have temp(rename=(dat=new_dat) drop=avisit);
 by subjid;
run;
data want2;
 set want;
 if missing(dat) then call missing(new_dat);
run;
Jack2012
Obsidian | Level 7

Thank you, actually, in your algorithm to derive the last variable, there is bit problem in below case, I just modify the 1003 subject, and the last variable value turned to the one I did not want. 

data have;
infile datalines truncover;
input subjid avisit dat : yymmdd10.;
format dat yymmdd10.;
datalines;
1001 0 1997-01-01
1001 1 1997-02-01
1001 2
1001 3 1997-05-01
1002 0 1998-01-01
1002 1 1998-02-01
1002 2
1002 3
1002 4
1002 5
1002 6 1998-12-01
1003 0 1999-01-01
1003 1 1999-02-01
1003 2
1003 3 1999-03-08
1003 4
1003 5
1003 6 1999-05-01
1003 7
1003 8
;
data temp;
set have(where=(dat is not missing));
by subjid;
if last.subjid;
run;
data want;
merge have temp(rename=(dat=new_dat) drop=avisit);
by subjid;
run;
data want2;
set want;
if missing(dat) then call missing(new_dat);
run;

 

Jack2012_0-1599657081703.png

 

 

The right value I am looking for is 1999-03-08 for 1003, how should I change the code? Thanks

PGStats
Opal | Level 21

"The right value I am looking for is 1999-03-08 for 1003"

 

What's the rule?

PG
Jack2012
Obsidian | Level 7

Hi, The rule is that if there is consecutive missing (missing consecutively for two more more times for each subjid) then the last non-missing DAT will be chosen as the value of ADT. 

Jack2012
Obsidian | Level 7
I mean the last non-missing DAT will be chosen as the value of ADT for this SUBJID.
PGStats
Opal | Level 21

This implements the rule: take the first dat value that follows a missing dat value :

 

data want;
do until(last.subjid);
    set have; by subjid;
    if missing(dat) then gap = 1;
    else if gap and missing(val) then val = dat;
    end;
do until(last.subjid);
    set have; by subjid;
    if not missing(dat) 
        then adt = val;
        else call missing(adt);
    output;
    end;
drop gap val;
format adt yymmdd10.;
run;
PG
Jack2012
Obsidian | Level 7

Higly appreciated for your help. However, I found maybe you missed the 'consecutive' missing, for 1002, the adt should be 1998-02-01 (of avisit=1) as for the avisit=2, 3, 4, and 5 DAT are missing consecutively for this subject. Now 1003 the value is what I want to have. 

Could you furtherly help me. Million thanks.

 

Jack2012_0-1599683138465.png

 

PGStats
Opal | Level 21

Ok then. The rule seems to be: take the first dat value that immediately precedes a gap of size 2 or more; assume there is such a gap at the end of the series. Try this:

 

data want;
do until(last.subjid);
    set have; by subjid;
    if gap < 2 then
        if missing(dat) then gap = sum(1, gap);
        else val = dat;
    end;
do until(last.subjid);
    set have; by subjid;
    if not missing(dat) 
        then adt = val;
        else call missing(adt);
    output;
    end;
drop gap val;
format adt yymmdd10.;
run;

 

PG
Ksharp
Super User
data have;
infile datalines truncover;
input subjid avisit dat : yymmdd10.;
format dat yymmdd10.;
datalines;
1001 0 1997-01-01
1001 1 1997-02-01
1001 2
1001 3 1997-05-01
1002 0 1998-01-01
1002 1 1998-02-01
1002 2
1002 3
1002 4
1002 5
1002 6 1998-12-01
1003 0 1999-01-01
1003 1 1999-02-01
1003 2
1003 3 1999-03-08
1003 4
1003 5
1003 6 1999-05-01
1003 7
1003 8
;

data have1;
 set have;
 by subjid;
 retain found 0;
 if first.subjid then do;n=0;found=0;end;
 if missing(dat) then n+1;
  else n=0;
 if n=2 then found=1;
 if not found;
 drop n found;
run;

data temp;
set have1(where=(dat is not missing));
by subjid;
if last.subjid;
run;
data want;
merge have temp(rename=(dat=new_dat) drop=avisit);
by subjid;
run;
data want2;
set want;
if missing(dat) then call missing(new_dat);
run;
Jack2012
Obsidian | Level 7

Thank you very much, Sharp. Resolved perfectly. 

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 10 replies
  • 2510 views
  • 0 likes
  • 3 in conversation