Estimate datdiff from a single variable

Estimate datdiff from a single variable

I need to estimate the difference between interview dates (<interviewdt>), delete those interviews completed <300 days apart, and re-code questionnaire numbers to reflect the new dataset. This is longitudinal data.

Variables:

1. study_id = unique id per patient

2. qnum = questionnaire number

3. interviewdt = date interview was conducted

data datediff ;

input study_id qnum interviewdt ;

datalines ;

1 1 01Jan2007

1 2 04Jan2007

1 3 07July2008

2 1 15Feb2009

2 2 03Mar2009

2 3 30Mar2010

3 1 20Dec2012

3 2 15Feb2013 ;

run ;

Data should look like:

Obs          study_id          qnum       interviewDt

1                    1                    1               01Jan2007

2                    1                    2               04Jan2007

3                    1                    3               07July2008

4                    2                    1              15Feb2009

5                    2                    2               03Mar2009

6                    2                    3                30Mar2010

7                    3                    1              20Dec2012

8                    3                    2              15Feb2013

--I need to understand how to bu program that would (1) Compute the difference between dates within an individual (i.e. from qnum=1 to qnum=2, and qnum=2 and qnum=3) etc (2) Delete observations who completes an interview <300d from the previous (i.e. should only complete one questionnaire approximately every 12 months) (3) renumber QNUM (e.g. in the above data, I would delete Obs 2 and Obs 8 -- so I want to create a new variable for Qnum reflecting the new interview number -- such that, for Obs2 where Qnum=2, this observation would be deleted and Obs3 where QNum=3 would become NewQnum=2). I am not familiar with <proc SQL> and would prefer to avoid (I know you all love SQL on the discussion boards!)

** ====================================================== ;

This is what I tried but it didn't work :

data dateDiff ;

set old ;

by study_id ;

diff_IntDt = dif(interviewDt) ; /*Compute difference between interviewDt*/

if not first.study_id then output ;

run ;

proc print data=dateDiff ;

var interviewYr qnum ;

run ;

PROBLEM: I ended up deleting ALL my first questionnaires (since I asked for <if not first.study_id then output>). If I don't include this condition, however, I don't get a dataset computing the difference between sequential <interviewdt>.

The next step I would imagine is:

data new ;

set dateDiff ;

by study_id qnum ;

if diff_intDt <300 then delete ; /*Delete any observation who's interview date occurs <300 days from the previous date*/

qCount = 0 ; /*Want to re-label QNUM to reflect the deleted observations*/

if qnum=1 then qCount=qnum+1 ;

label qCount = 'Number of Completed Questionnaires' ;

run ; quit ;

proc print data = dateDiff ;

var interviewYr qCount ;

run ;

Re: Estimate datdiff from a single variable

I think you are confusing us by saying "the data should look like" without removing any of the examples.

<assuming you have sorted the data by study_id, interviewDt and interviewDt is a SAS date variable>

data dateDiff ;

set old ;

by study_id ;

diff_IntDt = dif(interviewDt) ; /*Compute difference between interviewDt*/

if first.study_id OR (not first.study_id AND dif_intdt > 300) then output ;

run ;

Re: Estimate datdiff from a single variable

Hi,

This will give you what you want.  I used the datdif() function, which is what I think you meant to use, but I had to use another variable to remember the date I'm comparing the interview date to.  I made a comment where I did that below.

data datediff ;

input study_id
qnum
interviewdt date9.;

datalines;
1 1 01Jan2007
1 2 04Jan2007
1 3 07Jul2008
2 1 15Feb2009
2 2 03Mar2009
2 3 30Mar2010
3 1 20Dec2012
3 2 15Feb2013
4 1 15Feb2009
4 3 30Mar2010
4 3 05Apr2010

;

run ;

data datediff2;
set datediff;

* I use a new variable to store the start_date argument for datdif();
retain date_previous
NewQnum2;

by study_id;

if first.study_id then do;
date_previous = interviewdt;
NewQnum2=1;
end;

* Here I use the date_previous variable I created earlier as start_date and the current row interviewDt as end_date;
diff_IntDt = datdif(date_previous,interviewDt,'ACT/ACT');

if diff_IntDt < 300 then do;
output;
NewQnum2 + 1;
end;

* When I have already output the row, I store the current row in the new variable so it can be used next row;
date_previous = interviewdt;

run;

proc print data=datediff2; run;

