Help using Base SAS procedures

Identify cases using first and retain

Frequent Contributor
Posts: 101

Identify cases using first and retain


I've a dataset sorted by patient ID (PID), diagnosis code (dx), doctor specialty code (specialty), and service date (date) in descending order, and doctor ID (docID).

I'd like to identify patients seeking more than second opinion of doctor for the same diagnosis and from doctors with the same specialty -- that is, seeing a third doctor of the same specialty -- prior to the the most doctor visit. A patient might go back and forth to different doctors.

The data is collected between January 1, 2011 and June 30, 2012. There is no date restriction except that the most recent visit within the same dx and specialty (or first record) should take place April 1, 2012.

Here are the data and SAS code to read in the data (a fiscal year variable fyear is calculated (going by April 1 as the start of a fiscal year):

data dat; input

PiD 1 Date $ 3-10 DocID 12-14 Fee_Code $ 16-19 Specialty 21-22 Dx 24-26 Amount 28-30 Notes $ 32-76;


1 20120501 101 A585 28 289  60 1st doc w/n same PID, specialty code, dx code

1 20120322 101 A585 28 289  60 1st doc

1 20120313 102 A585 28 289  60 2nd doc

1 20120306 101 A585 28 289  60 1st doc

1 20110830 111 A585 28 300  60 3rd doc w/n same PID, specialty code, dx code

1 20110524 101 A585 28 289  60

1 20110420 101 A585 28 289  60

1 20110411 101 A585 28 289  60

1 20110101 101 A585 28 289  60

1 20120515 104 A401 40 110 100 1st doc w/n same PID, specialty code, dx code

1 20120303 105 A401 40 110 100 2nd doc

1 20120101 104 A401 40 110 100 1st doc

2 20120101 101 A303 15 305  30 1st doc but 1st record date < April 1, 2012

2 20111130 105 A303 15 305  50 2nd doc

2 20111009 108 A303 15 305  30 3rd doc

2 20110707 103 A303 15 305  30

3 20120415 201 A220 30 211 116 1st doc w/n same PID, specialty code, dx code

3 20120401 205 A220 30 211 116 2nd doc

3 20120303 201 A220 30 211 116 1st doc

3 20111231 205 A220 30 211 116 2nd doc


data dat (rename=(Date2=Date)); format PID Date2 Fyear; set dat;

    format Date2 yymmdd10.;


    fyear = year(intnx("YEAR.4",Date2,0));

    drop date;


(The dataset "dat" is attached as well).

Based on the above mentioned criteria, only PID=1 and specialty=28 and DX=289 should be outputted because there is a third visit to DocID=111 on Aug 30, 2011.

PID=1, speciatlty=40, DX=110: there are 3 visits but only to 2 different doctors

PID=2: there are 4 visits to 3 different doctors but the most recent visit took place in January 2010, which is fiscal year 2011 (< April 1, 2012)

PID=3: there are 4 visits but to 2 different doctors

So thte data output should look like:


I assume I should carry docID to the next record to compare and see if they are identical, but I need to compare not just the preceding one but all the preceding ones. I could transpose and compare docID1, docID2, docID3....., but I could have many records due to many visit entries (not manageable, just not preferable if I had other solutions).

Thanks in advance for any help you could provide.

Frequent Contributor
Posts: 101

Re: Identify cases using first and retain

On second thought,

Once I get rid of cases with first record's date < April 1, 2012, I think I could have just use Proc SQL or Proc SORT nodupkey to first keep unique records of PID, specialty and DX, docID and then count # of docID. If it's >=3, then there is a visit to a third doctor.

So I'd like to rephrase my questions so that I have to use RETAIN to read down the data and compare preceding records (if possible) - that is what if I'd like to identify only cases with a visit to a third doctor that is within a year prior to the most recent visit date. The output should still be the same.


Occasional Contributor
Posts: 11

Re: Identify cases using first and retain


With the conditions you mentioned, there will be no output. The output you are expecting is having different dx i.e. diagnosis. But u mentioned it should be same diagnosis from third doctor.

Do you really want to have same dx or just same patient, third doctor and same specialty only....?

Please correct me if i missed something...

Frequent Contributor
Posts: 101

Re: Identify cases using first and retain

Thanks for replying. The third doctor has to be from the same specialty for the same DX of the patient. So yes it has to be under the same patient, same patient's dx and same specialty of doctor.

Frequent Contributor
Posts: 83

Re: Identify cases using first and retain

Ok so first off you want to take only those cases who meet your first criteria which is the first visit was after April 1, 2012. For reference the dataset you put out doesn't have anyone who meets that criteria, but you can run this code to take out all of the other people (I specify in the below code that the date is April 1, 2012.

proc sql;

create table um2 as

select distinct PID, date, docid, specialty, dx, fyear, (max(date)) format=mmddyy10. as last_dt /*You specify what the most recent date was so you can compare to it in the next step*/

from dat

group by PID

having (min(date)>'01Apr2012'd); /*Here you specify that the first visit for the PID has to have occurred after this date (April 1, 2012) and remove all PID who do not meet this criteria*/

create table um3 as

select distinct PID, docid, dx, specialty /*Since you now know what the most recent visit date is, you can specify that the visit that you are counting had to have occurred within a specific time frame, in this case 1 year or 365 days. I wrote it that the most recent date is first so you have to specify datdif to be negative, but you could reverse it. Also I included 0, so the last visit is counted*/

from um2

where datdif(last_dt, date, 'act/act') between -365 and 0;

create table um4 as /*since you have eliminated all of the visits which didn't occur within 1 year you now can reduce your visits based on dx and specialty, and count the number of DocID's that are in the table for each PID*/

select distinct PID, dx, specialty, (count(docid)) as count

from um3

group by pid, dx, specialty;

create table cohort as

select distinct pid, dx, specialty

from um4

where count >=3; /*This line could have been specified in um4 as having count >=3 or it can be written as another line. This specification will give you only those PID who had at least 3 visits to different docids in the year preceding the most recent visit for the same dx and in the same specialty*/


This code should create the cohort that you have so far specified.

Super User
Posts: 5,083

Re: Identify cases using first and retain

Why would you sort by descending service date?  That makes it impossible for a DATA step to do the work. 

Do you have any control over the sorted order to the data?

Ask a Question
Discussion stats
  • 5 replies
  • 4 in conversation