11-29-2017 02:33 PM - edited 11-29-2017 03:40 PM
Dear SAS support team,
I'm not sure how to approach this analysis and what is the best way, here is what I want to do:
DATA A: I have a file with one row per observation, and for each observation I have the dates of repeated tests so I have test1 date, test2 dates etc.
Data B: I have another data which has the same observations as above but with many rows per individual or observation. The data for a prescription date for a certain drug, some observations has more than 200 dates for prescription. What I need to do is to find if those individuals in data A with repeated test in a certain date for example in 2010 if they had a prescription for the drug in DATA B before the date of the test.
I want to flat file (make one row per observation) DATA B to link it with A, but I don't know how to do that with arrays if some has 200 prescription dates.
My goal is to know if those with repeated tests have taken this drug before the test or not. Do I compress the data by month year better to make it shorter, will it be better to extract the data for each year to make the prescription date lower for each patient for example (2010, 2011, 2012, and only do the analysis for each year separately)??
I'm having a hard time to do it. and I need it to be done as soon as possible
I appreciate so much your help, and you've been a great support to me always.
Here is a simple display of how my data, sorry I can't upload the data for privacy issues.
patient id test1date test2date test3 date
1 01JAN2011 03MAY2012 04JUN2014
2 ....so on
patient id prescription date
1 so on
11-29-2017 02:41 PM
Can you please include some more sample data, as a data step preferably, and what you want as output as well.
Make sure that your sample data covers all cases you may encounter so you can adequately test your code.
E.g case when there's no drugs before test, when dates are exact same, when drugs are after the test, etc....
11-29-2017 03:46 PM
Thank you so much for your quick reply, I apologize the title was wrong so I edit it, since the latter was irrelevant. I've attached a MS document to explain more how the data looks like.
I was thinking of cutting the long data specifying each year, so the no. of prescription dates become smaller, so for each year, I create a separate data and for each year, I format the dates to months. and do the same for the tests data to take only each year separately. and format to month year.
I know it is a bit hard to explain sometimes, I hope this time it is clearer.
Thank you so much for your help,
11-29-2017 03:58 PM
When you have a requirement like "before" you may need to provide more criteria.
Below is an example of providing data in the form of data steps and one approach to a comparison.
data work.a; informat patientid $5. testdate1 - testdate3 date9.; format testdate1 - testdate3 date9.; input patientid testdate1 - testdate3; datalines; 1 01JAN2011 03MAY2012 04JUN2014 ; run; data work.b; informat patientid $5. presciptiondate date9.; format presciptiondate date9.; input patientid presciptiondate; datalines; 1 01JAN2010 1 05MAY2013 1 25JUL2013 ; run; /* single date per record may work better*/ proc transpose data=work.a out=work.atrans prefix=testdate; by patientid; var testdate1 - testdate3; run; proc sql; create table work.example as select a.patientid, a.testdate1 as testdate, b.presciptiondate from work.atrans as a inner join work.b as b on a.patientid=b.patientid where a.testdate1 < b.presciptiondate ; quit;
As you can see in the work.example data set there are multiple testdates before a given prescriptiondate.
I named the test date variables testdate1 through testdate3 to take advantage of short hand variable name use that would not be possible with names like test1date test2date and so one. For instance to change the code I wrote to read data set A with 20 variables you only need to change the 3 to 20 in three places. To use test1date through test20date you would have list them explicitly.
The transpose step gets one record per patient id and test date so the search comparison done in proc sql only needs to look at one variable. Otherwise you would have to compare 20 test date variables to each prescription date and you might have to do more work to identify which test date was the "before" value in the result.
Proc SQL has many different ways to test comparisons and make results based on values of variables in multiple sets. The example is only one.
If you only wanted to know which prescription dates had at least one prior test then something like this:
proc sql; create table work.example2 as select distinct a.patientid, b.presciptiondate from work.atrans as a inner join work.b as b on a.patientid=b.patientid where a.testdate1 < b.presciptiondate ; quit;