BookmarkSubscribeRSS Feed
toneill
Calcite | Level 5

I need to estimate the difference between interview dates (<interviewdt>), delete those interviews completed <300 days apart, and re-code questionnaire numbers to reflect the new dataset. This is longitudinal data.

Variables:

1. study_id = unique id per patient

2. qnum = questionnaire number

3. interviewdt = date interview was conducted

data datediff ;

  input study_id qnum interviewdt ;

  datalines ;

  1 1 01Jan2007

  1 2 04Jan2007

  1 3 07July2008

  2 1 15Feb2009

  2 2 03Mar2009

  2 3 30Mar2010

  3 1 20Dec2012

  3 2 15Feb2013 ;

run ;

Data should look like:

Obs          study_id          qnum       interviewDt         

1                    1                    1               01Jan2007

2                    1                    2               04Jan2007  

3                    1                    3               07July2008

4                    2                    1              15Feb2009

5                    2                    2               03Mar2009      

6                    2                    3                30Mar2010   

7                    3                    1              20Dec2012          

8                    3                    2              15Feb2013

--I need to understand how to bu program that would (1) Compute the difference between dates within an individual (i.e. from qnum=1 to qnum=2, and qnum=2 and qnum=3) etc (2) Delete observations who completes an interview <300d from the previous (i.e. should only complete one questionnaire approximately every 12 months) (3) renumber QNUM (e.g. in the above data, I would delete Obs 2 and Obs 8 -- so I want to create a new variable for Qnum reflecting the new interview number -- such that, for Obs2 where Qnum=2, this observation would be deleted and Obs3 where QNum=3 would become NewQnum=2). I am not familiar with <proc SQL> and would prefer to avoid (I know you all love SQL on the discussion boards!)

** ====================================================== ;

This is what I tried but it didn't work :

data dateDiff ;

  set old ;

  by study_id ;

     diff_IntDt = dif(interviewDt) ; /*Compute difference between interviewDt*/

     if not first.study_id then output ;

run ;

proc print data=dateDiff ;

  var interviewYr qnum ;

run ;

PROBLEM: I ended up deleting ALL my first questionnaires (since I asked for <if not first.study_id then output>). If I don't include this condition, however, I don't get a dataset computing the difference between sequential <interviewdt>.

The next step I would imagine is:

data new ;

  set dateDiff ;

  by study_id qnum ;

     if diff_intDt <300 then delete ; /*Delete any observation who's interview date occurs <300 days from the previous date*/

  qCount = 0 ; /*Want to re-label QNUM to reflect the deleted observations*/

     if qnum=1 then qCount=qnum+1 ;

     label qCount = 'Number of Completed Questionnaires' ;

run ; quit ;

proc print data = dateDiff ;

  var interviewYr qCount ;

run ; 

2 REPLIES 2
ballardw
Super User

I think you are confusing us by saying "the data should look like" without removing any of the examples.

<assuming you have sorted the data by study_id, interviewDt and interviewDt is a SAS date variable>

data dateDiff ;

  set old ;

  by study_id ;

     diff_IntDt = dif(interviewDt) ; /*Compute difference between interviewDt*/

     if first.study_id OR (not first.study_id AND dif_intdt > 300) then output ;

run ;

seeLowGreen
Calcite | Level 5

Hi,

This will give you what you want.  I used the datdif() function, which is what I think you meant to use, but I had to use another variable to remember the date I'm comparing the interview date to.  I made a comment where I did that below.

data datediff ;

  input study_id
  qnum
  interviewdt date9.;

datalines;
1 1 01Jan2007
1 2 04Jan2007
1 3 07Jul2008
2 1 15Feb2009
2 2 03Mar2009
2 3 30Mar2010
3 1 20Dec2012
3 2 15Feb2013
4 1 15Feb2009
4 3 30Mar2010
4 3 05Apr2010

;

run ;


data datediff2;
  set datediff;

  * I use a new variable to store the start_date argument for datdif();
  retain date_previous
   NewQnum2;

  by study_id;

  if first.study_id then do;
    date_previous = interviewdt;
NewQnum2=1;
  end;

  * Here I use the date_previous variable I created earlier as start_date and the current row interviewDt as end_date;
  diff_IntDt = datdif(date_previous,interviewDt,'ACT/ACT');

  if diff_IntDt < 300 then do;
    output;
NewQnum2 + 1;
  end;

  * When I have already output the row, I store the current row in the new variable so it can be used next row;
  date_previous = interviewdt;

run;

proc print data=datediff2; run;

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 2 replies
  • 724 views
  • 6 likes
  • 3 in conversation