BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
lisahoward
Calcite | Level 5

Hello,

I wondered if anyone could me in coming up with some SAS code help to randomly assign an index date to a patients diagnosis, (lets say diabetes), within a 6 month calender block?

Normally the index date would be taken as first occurrence of diagnosis, in this instance though the index date is is assigned by randomly selecting one of the patients physician visits within the 6 month calender block that the first diagnosis occurred.

For example patient X is diagnosed 02Jan2000 with Diabetes.  He has 100 Physician visits and dates in the healthcare database  , 90 of these dates/visits occur in the same  6 month block of time  01jan2000 - 30june2000 as the first diagnosis of diabetes.  1 of these 90  visit dates has to be randomly chosen and assigned to the patient as the index date.  The other 10 visit dates can't be used as they fall outside of the cohort accrual 6 month block (either before or after 01jan2000 - 30june2000).


Does anyone have some code that would help me find and assign index dates this way please?


The over all cohort of patients  spans 2004-2013 so there would be 20 blocks of  6 months to work through. Index are only assigned/matched within each block so if a diagnosis occurred in block 20 (1JUL2013-31DEC2013) then only visits for that patient also falling within the same time block are used to randomly find and assign an index date for that patient.

So far I have created a dataset with  first diagnosis and have created a variable Index_Range that I have populated as follows:

PatID     Dx Date      Index_Range

1         20Jan2005      1JAN2005 - 30JUN2005

2         18JUN2006    1JAN2006 - 30JUN2006

3         02SEP2012    1JUL2012 - 31DEC2012

I next  search and pull out all physician records for patients in dataset created above and assign a similar Index_Range type variable using the physician visit date.

I presume I then merge all visits in Diagnosis dataset and Physician visit Dataset matching on  PatiD and Index_Range variable.

What I don't know is how to code after, when i have merged my datasets and have all the visits matched via index range to the patients diagnosis  what is the code I use  to randomly select only one visit per patient and assign that visit as the Index date.  Any thoughts on helping me write the code to 'randomly select' would be great.

Thank you in advance.

1 ACCEPTED SOLUTION

Accepted Solutions
data_null__
Jade | Level 19

Selecting the random sample of the targeted range of visits is the easiest step in your process and can be done with PROC SURVEYSELECT;

data visit;
   input PatID  (DxDate inx1-inx2 visitDate) (:date9.) visitnum;
   format dxdate inx1-inx2 visitdate date9.;
  
cards;
1         20Jan2005    1JAN2005  30JUN2005 05Jan2005 4
1         20Jan2005    1JAN2005  30JUN2005 25Jan2005 5
1         20Jan2005    1JAN2005  30JUN2005 03APR2005 6
1         20Jan2005    1JAN2005  30JUN2005 28MAY2005 7
1         20Jan2005    1JAN2005  30JUN2005 01JUN2005 8
2         20Jan2005    1JAN2005  30JUN2005 05Jan2005 4
2         20Jan2005    1JAN2005  30JUN2005 25Jan2005 5
2         20Jan2005    1JAN2005  30JUN2005 03APR2005 6
2         20Jan2005    1JAN2005  30JUN2005 28MAY2005 7
2         20Jan2005    1JAN2005  30JUN2005 01JUN2005 8
;;;;
   run;

proc surveyselect SEED=557754207 N=1 out=dxvisit;
   strata patid;
   run;
proc print;
  
run;

1-26-2015 6-13-45 AM.png

View solution in original post

5 REPLIES 5
data_null__
Jade | Level 19

Selecting the random sample of the targeted range of visits is the easiest step in your process and can be done with PROC SURVEYSELECT;

data visit;
   input PatID  (DxDate inx1-inx2 visitDate) (:date9.) visitnum;
   format dxdate inx1-inx2 visitdate date9.;
  
cards;
1         20Jan2005    1JAN2005  30JUN2005 05Jan2005 4
1         20Jan2005    1JAN2005  30JUN2005 25Jan2005 5
1         20Jan2005    1JAN2005  30JUN2005 03APR2005 6
1         20Jan2005    1JAN2005  30JUN2005 28MAY2005 7
1         20Jan2005    1JAN2005  30JUN2005 01JUN2005 8
2         20Jan2005    1JAN2005  30JUN2005 05Jan2005 4
2         20Jan2005    1JAN2005  30JUN2005 25Jan2005 5
2         20Jan2005    1JAN2005  30JUN2005 03APR2005 6
2         20Jan2005    1JAN2005  30JUN2005 28MAY2005 7
2         20Jan2005    1JAN2005  30JUN2005 01JUN2005 8
;;;;
   run;

proc surveyselect SEED=557754207 N=1 out=dxvisit;
   strata patid;
   run;
proc print;
  
run;

1-26-2015 6-13-45 AM.png
lisahoward
Calcite | Level 5

Thank you so much for your quick response.

Babloo
Rhodochrosite | Level 12

What does the seedvalue (SEED=557754207) indicates?

Phaneendra
Fluorite | Level 6

It gives specifications for random seed generation.

Used for replicating results, i.e if we run the program more than once with same seed number, output will be the same; But if we change the seed number, output will be different.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 3376 views
  • 6 likes
  • 4 in conversation