SAS beginner needing help selecting data based on a group of values.

Reply
New Contributor
Posts: 2

SAS beginner needing help selecting data based on a group of values.

Hello everyone!

I would really appreciate some help selecting data for a longitudinal data project I am working on. Here is my situation:

- I have selected a cohort of individuals that have their "Time 1" measurements on a particular date.

- For subsequent Time 2 , Time 3 (and so on) measurements, I wish to clean the data to select only the cases from the Time 1 cohort are represented at Time 2 and Time 3. Many of the Time 2, Time 3, etc. records are cases that are not part of the Time 1 cohort.

To put it another way, I want to tell SAS to check if the ID Number of a case at a time that is not "Time 1" matches the ID number of a case from the Time 1 cohort, and if the ID number does not match any ID number's from the Time 1 Cohort to exclude the case. I cannot run this with a simple if statement because the Time 1 cohort includes 7102 cases and thus would be too long.

I am using PROC MIXED to analyze this data, and when I try to run the data through PROC MIXED without doing the filtering procedures I have described above SAS crashes due to lack of memory. Therefore, I want to clean the dataset to allow SAS to run the dataset.

Any help you could give would be greatly appreciated!

Thank you so much,

~Chelsea

Respected Advisor
Posts: 4,646

Re: SAS beginner needing help selecting data based on a group of values.

Please give us an idea of your dataset structure.

PG
New Contributor
Posts: 2

Re: SAS beginner needing help selecting data based on a group of values.

Sure thing! The data is set up in a typical longitudinal structure - meaning that each row of the dataset represents measurements at one time point for a case. Each case is assigned a number, and if the same case is measured at a different time the time point this is represented in a separate row with the same case identifier and a variable stating the timepoint. The problem is that some of the cases at Time 2 are not part of the Time 1 cohort so I need a way to isolate and exclude them since the dataset is too large to do that manually.

Respected Advisor
Posts: 4,646

Re: SAS beginner needing help selecting data based on a group of values.

The following query will select all cases having at least 3 measures with the first one having occured not later than Jan. 1st 2000. Modify to your needs

%let minPoints=3; /* minimum number of time points required to keep case */
%let time1Limit='01JAN2000'd; /* latest date for Time 1 measurements */

proc sql;
create table multiMeasures as
select *

from myMeasures
where caseID in (
select caseID
from myMeasures
group by caseID
having min(measureDate) <= &time1Limit and
  count(measure) >= &minPoints )
order by caseID, measureDate;
quit;

PG

PG
Ask a Question
Discussion stats
  • 3 replies
  • 652 views
  • 0 likes
  • 2 in conversation