BookmarkSubscribeRSS Feed
toneill
Calcite | Level 5

I need to estimate the difference between interview dates (<interviewdt>), delete those interviews completed <300 days apart, and re-code questionnaire numbers to reflect the new dataset. This is longitudinal data.

Variables:

1. study_id = unique id per patient

2. qnum = questionnaire number

3. interviewdt = date interview was conducted

data datediff ;

  input study_id qnum interviewdt ;

  datalines ;

  1 1 01Jan2007

  1 2 04Jan2007

  1 3 07July2008

  2 1 15Feb2009

  2 2 03Mar2009

  2 3 30Mar2010

  3 1 20Dec2012

  3 2 15Feb2013 ;

run ;

Data should look like:

Obs          study_id          qnum       interviewDt         

1                    1                    1               01Jan2007

2                    1                    2               04Jan2007  

3                    1                    3               07July2008

4                    2                    1              15Feb2009

5                    2                    2               03Mar2009      

6                    2                    3                30Mar2010   

7                    3                    1              20Dec2012          

8                    3                    2              15Feb2013

--I need to understand how to bu program that would (1) Compute the difference between dates within an individual (i.e. from qnum=1 to qnum=2, and qnum=2 and qnum=3) etc (2) Delete observations who completes an interview <300d from the previous (i.e. should only complete one questionnaire approximately every 12 months) (3) renumber QNUM (e.g. in the above data, I would delete Obs 2 and Obs 8 -- so I want to create a new variable for Qnum reflecting the new interview number -- such that, for Obs2 where Qnum=2, this observation would be deleted and Obs3 where QNum=3 would become NewQnum=2). I am not familiar with <proc SQL> and would prefer to avoid (I know you all love SQL on the discussion boards!)

** ====================================================== ;

This is what I tried but it didn't work :

data dateDiff ;

  set old ;

  by study_id ;

     diff_IntDt = dif(interviewDt) ; /*Compute difference between interviewDt*/

     if not first.study_id then output ;

run ;

proc print data=dateDiff ;

  var interviewYr qnum ;

run ;

PROBLEM: I ended up deleting ALL my first questionnaires (since I asked for <if not first.study_id then output>). If I don't include this condition, however, I don't get a dataset computing the difference between sequential <interviewdt>.

The next step I would imagine is:

data new ;

  set dateDiff ;

  by study_id qnum ;

     if diff_intDt <300 then delete ; /*Delete any observation who's interview date occurs <300 days from the previous date*/

  qCount = 0 ; /*Want to re-label QNUM to reflect the deleted observations*/

     if qnum=1 then qCount=qnum+1 ;

     label qCount = 'Number of Completed Questionnaires' ;

run ; quit ;

proc print data = dateDiff ;

  var interviewYr qCount ;

run ; 

2 REPLIES 2
ballardw
Super User

I think you are confusing us by saying "the data should look like" without removing any of the examples.

<assuming you have sorted the data by study_id, interviewDt and interviewDt is a SAS date variable>

data dateDiff ;

  set old ;

  by study_id ;

     diff_IntDt = dif(interviewDt) ; /*Compute difference between interviewDt*/

     if first.study_id OR (not first.study_id AND dif_intdt > 300) then output ;

run ;

seeLowGreen
Calcite | Level 5

Hi,

This will give you what you want.  I used the datdif() function, which is what I think you meant to use, but I had to use another variable to remember the date I'm comparing the interview date to.  I made a comment where I did that below.

data datediff ;

  input study_id
  qnum
  interviewdt date9.;

datalines;
1 1 01Jan2007
1 2 04Jan2007
1 3 07Jul2008
2 1 15Feb2009
2 2 03Mar2009
2 3 30Mar2010
3 1 20Dec2012
3 2 15Feb2013
4 1 15Feb2009
4 3 30Mar2010
4 3 05Apr2010

;

run ;


data datediff2;
  set datediff;

  * I use a new variable to store the start_date argument for datdif();
  retain date_previous
   NewQnum2;

  by study_id;

  if first.study_id then do;
    date_previous = interviewdt;
NewQnum2=1;
  end;

  * Here I use the date_previous variable I created earlier as start_date and the current row interviewDt as end_date;
  diff_IntDt = datdif(date_previous,interviewDt,'ACT/ACT');

  if diff_IntDt < 300 then do;
    output;
NewQnum2 + 1;
  end;

  * When I have already output the row, I store the current row in the new variable so it can be used next row;
  date_previous = interviewdt;

run;

proc print data=datediff2; run;

sas-innovate-wordmark-2025-midnight.png

Register Today!

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.


Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 2 replies
  • 993 views
  • 6 likes
  • 3 in conversation