08-02-2014 10:12 PM
I have a sample data set (see attached) in which identification numbers of study participants as well as dates of visits appear along with diagnoses (DX). I need to calculate number of diagnoses per year of the study. Each identification number occurs more than once and each visit date (event date) has more than one diagnosis (DX1-DX4). I need to drop the diagnoses that are irrelevant and then somehow find the earliest date of repeat diagnoses. In sample data set the diagnosis of interest is "560" and occurs on visit date -22 for the first time. Exposure "870" is also of interest and occurs on day -133. Is there any way to tag each visit date to each IDC-9 code at a diagnosis and then determine which diagnosis occurred first? Dates are measured in days from January 1st 1999. Ideally I would like to collapse each identification number into one single line of observation with the first date of the outcome of interest as well as the exposure of interest as variables "date_exposure" "date_outcome" "exposure" "outcome". Additionally, a subject may have more than one outcome or exposure of interest and thus both would need to be retained. I then would like to run a simple frequency procedure to get frequency of outcomes and exposures by year. When making the date variables on the year of occurrence is important and thus I would like them to appear as "1999", "2000", etc. Is there any easy way to code this to achieve the desired outcome?