Hello,
I have a dataset that contains numerous observations for each unique id (i.e. 'orsid'). I am preparing my data to run a program that selects Medicaid claims data specific to family planning. In order for eligible enrollees who do not have claims data in a given year to be included in the denominator for calculations, I am supposed to retain one observation for that individual, and set the date to January 1 of the year.
I know that my use of 'first.orsid' in the code below is incorrect, but I am including it to demonstrate the logic that I have been trying to use. Could anyone help me to devise a strategy for retaining the first observation and changing the date only for individuals who do not have claims records for that year? If an individual (orsid) has no claims data for the year, then they would have 12 observations for that year (Jan 01 - Dec 01) in the input dataset. If they do have claims data, they would have many more observations than that.
data claim12b;
set claim12;
by orsid;
if (icd1='' and icd2='' and icd3='' and icd4='' and icd5='')
and (prc1='' and prc2='' and prc3='')
and (ndc1='' and ndc2='' and ndc3='' and ndc4='' and ndc5='' and ndc6='')
then do;
keep first.orsid;
date="o1jan2012"d;
end;
run;
I know this is incorrect in two ways. First, even if the logic worked, it would read each individual observation, rather than reading all 12 observations for the unique ID and determining if the individual had no claims during the course of 12 months. Secord, the 'keep first.orsid' statement does not work.
I greatly appreciate any guidance you may offer.
Thank you,
Ted