I need to estimate the difference between interview dates (<interviewdt>), delete those interviews completed <300 days apart, and re-code questionnaire numbers to reflect the new dataset. This is longitudinal data.
Variables:
1. study_id = unique id per patient
2. qnum = questionnaire number
3. interviewdt = date interview was conducted
data datediff ;
input study_id qnum interviewdt ;
datalines ;
1 1 01Jan2007
1 2 04Jan2007
1 3 07July2008
2 1 15Feb2009
2 2 03Mar2009
2 3 30Mar2010
3 1 20Dec2012
3 2 15Feb2013 ;
run ;
Data should look like:
Obs study_id qnum interviewDt
1 1 1 01Jan2007
2 1 2 04Jan2007
3 1 3 07July2008
4 2 1 15Feb2009
5 2 2 03Mar2009
6 2 3 30Mar2010
7 3 1 20Dec2012
8 3 2 15Feb2013
--I need to understand how to bu program that would (1) Compute the difference between dates within an individual (i.e. from qnum=1 to qnum=2, and qnum=2 and qnum=3) etc (2) Delete observations who completes an interview <300d from the previous (i.e. should only complete one questionnaire approximately every 12 months) (3) renumber QNUM (e.g. in the above data, I would delete Obs 2 and Obs 8 -- so I want to create a new variable for Qnum reflecting the new interview number -- such that, for Obs2 where Qnum=2, this observation would be deleted and Obs3 where QNum=3 would become NewQnum=2). I am not familiar with <proc SQL> and would prefer to avoid (I know you all love SQL on the discussion boards!)
** ====================================================== ;
This is what I tried but it didn't work :
data dateDiff ;
set old ;
by study_id ;
diff_IntDt = dif(interviewDt) ; /*Compute difference between interviewDt*/
if not first.study_id then output ;
run ;
proc print data=dateDiff ;
var interviewYr qnum ;
run ;
PROBLEM: I ended up deleting ALL my first questionnaires (since I asked for <if not first.study_id then output>). If I don't include this condition, however, I don't get a dataset computing the difference between sequential <interviewdt>.
The next step I would imagine is:
data new ;
set dateDiff ;
by study_id qnum ;
if diff_intDt <300 then delete ; /*Delete any observation who's interview date occurs <300 days from the previous date*/
qCount = 0 ; /*Want to re-label QNUM to reflect the deleted observations*/
if qnum=1 then qCount=qnum+1 ;
label qCount = 'Number of Completed Questionnaires' ;
run ; quit ;
proc print data = dateDiff ;
var interviewYr qCount ;
run ;
I think you are confusing us by saying "the data should look like" without removing any of the examples.
<assuming you have sorted the data by study_id, interviewDt and interviewDt is a SAS date variable>
data dateDiff ;
set old ;
by study_id ;
diff_IntDt = dif(interviewDt) ; /*Compute difference between interviewDt*/
if first.study_id OR (not first.study_id AND dif_intdt > 300) then output ;
run ;
Hi,
This will give you what you want. I used the datdif() function, which is what I think you meant to use, but I had to use another variable to remember the date I'm comparing the interview date to. I made a comment where I did that below.
data datediff ;
input study_id
qnum
interviewdt date9.;
datalines;
1 1 01Jan2007
1 2 04Jan2007
1 3 07Jul2008
2 1 15Feb2009
2 2 03Mar2009
2 3 30Mar2010
3 1 20Dec2012
3 2 15Feb2013
4 1 15Feb2009
4 3 30Mar2010
4 3 05Apr2010
;
run ;
data datediff2;
set datediff;
* I use a new variable to store the start_date argument for datdif();
retain date_previous
NewQnum2;
by study_id;
if first.study_id then do;
date_previous = interviewdt;
NewQnum2=1;
end;
* Here I use the date_previous variable I created earlier as start_date and the current row interviewDt as end_date;
diff_IntDt = datdif(date_previous,interviewDt,'ACT/ACT');
if diff_IntDt < 300 then do;
output;
NewQnum2 + 1;
end;
* When I have already output the row, I store the current row in the new variable so it can be used next row;
date_previous = interviewdt;
run;
proc print data=datediff2; run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.