BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Jack2012
Obsidian | Level 7

Dear All, I have one very simple question:

 

Q1: How could below dataset be input successfully? 

Here is the data I have, which is created by below code:


data have;
input subjid avisit dat $20.;
datalines;
1001 0 1997-01-01
1001 1 1997-02-01
1001 2
1001 3 1997-05-01
1002 0 1998-01-01
1002 1 1998-02-01
1002 2
1002 3
1002 4
1002 5
1002 6 1998-12-01
1003 0 1999-01-01
1003 1 1999-02-01
1003 2
1003 3 1999-03-08
1003 4
1003 5 1999-04-05
1003 6 1999-05-01
1003 7
1003 8
;
run;

data have_;
set have;
adt=input(dat, yymmdd10.);
attrib adt format=yymmdd10.;

run;

Jack2012_0-1599652991246.png

 

 

Q2: If you look at the last variable adt generated by below dataset WANT (which, unfortunately I can't produce by below code successfully, please assist me as well), you will see actually, I want to derive the last non-missing dat which should prior the consecutive dat, but I have no idea how to derive the last column, base on the first 3 variable correctly, hope you can help me. 


data want;
infile datalines delimiter=",";
input subjid avisit dat $20. adt $20.;
datalines;
1001,0,1997-01-01,1997-05-01
1001,1,1997-02-01,1997-05-01
1001,2, ,1997-05-01
1001,3,1997-05-01,1997-05-01
1002,0,1998-01-01,1998-02-01
1002,1,1998-02-01,1998-02-01
1002,2, ,1998-02-01
1002,3, ,1998-02-01
1002,4, ,1998-02-01
1002,5, ,1998-02-01
1002,6,1998-12-01,1998-02-01
1003,0,1999-01-01,1999-05-01
1003,1,1999-02-01,1999-05-01
1003,2, ,1999-05-01
1003,3,1999-03-08,1999-05-01
1003,4, ,1999-05-01
1003,5,1999-04-05,1999-05-01
1003,6,1999-05-01,1999-05-01
1003,7, ,1999-05-01
1003,8, ,1999-05-01
;
run;

 

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User
data have;
infile datalines truncover;
input subjid avisit dat : yymmdd10.;
format dat yymmdd10.;
datalines;
1001 0 1997-01-01
1001 1 1997-02-01
1001 2
1001 3 1997-05-01
1002 0 1998-01-01
1002 1 1998-02-01
1002 2
1002 3
1002 4
1002 5
1002 6 1998-12-01
1003 0 1999-01-01
1003 1 1999-02-01
1003 2
1003 3 1999-03-08
1003 4
1003 5
1003 6 1999-05-01
1003 7
1003 8
;

data have1;
 set have;
 by subjid;
 retain found 0;
 if first.subjid then do;n=0;found=0;end;
 if missing(dat) then n+1;
  else n=0;
 if n=2 then found=1;
 if not found;
 drop n found;
run;

data temp;
set have1(where=(dat is not missing));
by subjid;
if last.subjid;
run;
data want;
merge have temp(rename=(dat=new_dat) drop=avisit);
by subjid;
run;
data want2;
set want;
if missing(dat) then call missing(new_dat);
run;

View solution in original post

10 REPLIES 10
Ksharp
Super User
data have;
infile datalines truncover;
input subjid avisit dat : yymmdd10.;
format dat yymmdd10.;
datalines;
1001 0 1997-01-01
1001 1 1997-02-01
1001 2
1001 3 1997-05-01
1002 0 1998-01-01
1002 1 1998-02-01
1002 2
1002 3
1002 4
1002 5
1002 6 1998-12-01
1003 0 1999-01-01
1003 1 1999-02-01
1003 2
1003 3 1999-03-08
1003 4
1003 5 1999-04-05
1003 6 1999-05-01
1003 7
1003 8
;
data temp;
 set have(where=(dat is not missing));
 by subjid;
 if last.subjid;
run;
data want;
 merge have temp(rename=(dat=new_dat) drop=avisit);
 by subjid;
run;
data want2;
 set want;
 if missing(dat) then call missing(new_dat);
run;
Jack2012
Obsidian | Level 7

Thank you, actually, in your algorithm to derive the last variable, there is bit problem in below case, I just modify the 1003 subject, and the last variable value turned to the one I did not want. 

data have;
infile datalines truncover;
input subjid avisit dat : yymmdd10.;
format dat yymmdd10.;
datalines;
1001 0 1997-01-01
1001 1 1997-02-01
1001 2
1001 3 1997-05-01
1002 0 1998-01-01
1002 1 1998-02-01
1002 2
1002 3
1002 4
1002 5
1002 6 1998-12-01
1003 0 1999-01-01
1003 1 1999-02-01
1003 2
1003 3 1999-03-08
1003 4
1003 5
1003 6 1999-05-01
1003 7
1003 8
;
data temp;
set have(where=(dat is not missing));
by subjid;
if last.subjid;
run;
data want;
merge have temp(rename=(dat=new_dat) drop=avisit);
by subjid;
run;
data want2;
set want;
if missing(dat) then call missing(new_dat);
run;

 

Jack2012_0-1599657081703.png

 

 

The right value I am looking for is 1999-03-08 for 1003, how should I change the code? Thanks

PGStats
Opal | Level 21

"The right value I am looking for is 1999-03-08 for 1003"

 

What's the rule?

PG
Jack2012
Obsidian | Level 7

Hi, The rule is that if there is consecutive missing (missing consecutively for two more more times for each subjid) then the last non-missing DAT will be chosen as the value of ADT. 

Jack2012
Obsidian | Level 7
I mean the last non-missing DAT will be chosen as the value of ADT for this SUBJID.
PGStats
Opal | Level 21

This implements the rule: take the first dat value that follows a missing dat value :

 

data want;
do until(last.subjid);
    set have; by subjid;
    if missing(dat) then gap = 1;
    else if gap and missing(val) then val = dat;
    end;
do until(last.subjid);
    set have; by subjid;
    if not missing(dat) 
        then adt = val;
        else call missing(adt);
    output;
    end;
drop gap val;
format adt yymmdd10.;
run;
PG
Jack2012
Obsidian | Level 7

Higly appreciated for your help. However, I found maybe you missed the 'consecutive' missing, for 1002, the adt should be 1998-02-01 (of avisit=1) as for the avisit=2, 3, 4, and 5 DAT are missing consecutively for this subject. Now 1003 the value is what I want to have. 

Could you furtherly help me. Million thanks.

 

Jack2012_0-1599683138465.png

 

PGStats
Opal | Level 21

Ok then. The rule seems to be: take the first dat value that immediately precedes a gap of size 2 or more; assume there is such a gap at the end of the series. Try this:

 

data want;
do until(last.subjid);
    set have; by subjid;
    if gap < 2 then
        if missing(dat) then gap = sum(1, gap);
        else val = dat;
    end;
do until(last.subjid);
    set have; by subjid;
    if not missing(dat) 
        then adt = val;
        else call missing(adt);
    output;
    end;
drop gap val;
format adt yymmdd10.;
run;

 

PG
Ksharp
Super User
data have;
infile datalines truncover;
input subjid avisit dat : yymmdd10.;
format dat yymmdd10.;
datalines;
1001 0 1997-01-01
1001 1 1997-02-01
1001 2
1001 3 1997-05-01
1002 0 1998-01-01
1002 1 1998-02-01
1002 2
1002 3
1002 4
1002 5
1002 6 1998-12-01
1003 0 1999-01-01
1003 1 1999-02-01
1003 2
1003 3 1999-03-08
1003 4
1003 5
1003 6 1999-05-01
1003 7
1003 8
;

data have1;
 set have;
 by subjid;
 retain found 0;
 if first.subjid then do;n=0;found=0;end;
 if missing(dat) then n+1;
  else n=0;
 if n=2 then found=1;
 if not found;
 drop n found;
run;

data temp;
set have1(where=(dat is not missing));
by subjid;
if last.subjid;
run;
data want;
merge have temp(rename=(dat=new_dat) drop=avisit);
by subjid;
run;
data want2;
set want;
if missing(dat) then call missing(new_dat);
run;
Jack2012
Obsidian | Level 7

Thank you very much, Sharp. Resolved perfectly. 

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 10 replies
  • 2442 views
  • 0 likes
  • 3 in conversation