04-15-2012 06:13 PM
I would really appreciate some help selecting data for a longitudinal data project I am working on. Here is my situation:
- I have selected a cohort of individuals that have their "Time 1" measurements on a particular date.
- For subsequent Time 2 , Time 3 (and so on) measurements, I wish to clean the data to select only the cases from the Time 1 cohort are represented at Time 2 and Time 3. Many of the Time 2, Time 3, etc. records are cases that are not part of the Time 1 cohort.
To put it another way, I want to tell SAS to check if the ID Number of a case at a time that is not "Time 1" matches the ID number of a case from the Time 1 cohort, and if the ID number does not match any ID number's from the Time 1 Cohort to exclude the case. I cannot run this with a simple if statement because the Time 1 cohort includes 7102 cases and thus would be too long.
I am using PROC MIXED to analyze this data, and when I try to run the data through PROC MIXED without doing the filtering procedures I have described above SAS crashes due to lack of memory. Therefore, I want to clean the dataset to allow SAS to run the dataset.
Any help you could give would be greatly appreciated!
Thank you so much,
04-15-2012 08:29 PM
Sure thing! The data is set up in a typical longitudinal structure - meaning that each row of the dataset represents measurements at one time point for a case. Each case is assigned a number, and if the same case is measured at a different time the time point this is represented in a separate row with the same case identifier and a variable stating the timepoint. The problem is that some of the cases at Time 2 are not part of the Time 1 cohort so I need a way to isolate and exclude them since the dataset is too large to do that manually.
04-15-2012 09:07 PM
The following query will select all cases having at least 3 measures with the first one having occured not later than Jan. 1st 2000. Modify to your needs
%let minPoints=3; /* minimum number of time points required to keep case */
%let time1Limit='01JAN2000'd; /* latest date for Time 1 measurements */
create table multiMeasures as
where caseID in (
group by caseID
having min(measureDate) <= &time1Limit and
count(measure) >= &minPoints )
order by caseID, measureDate;