Programming the statistical procedures from SAS

Remove subjects only observed once in repeated measures analysis

Reply
N/A
Posts: 0

Remove subjects only observed once in repeated measures analysis

Hello all,

I'm working with a large dataset (5,000+ subjects) that were measured anywhere between one and nine times. I'm using repeated measures in proc mixed to analyze my data. My variables are subject, year measured, and size. I have a lot of subjects, however, that were only measured in one year. Does anyone know how to remove subjects that only appear once in the dataset?

thanks for your help!
Carolyn
Valued Guide
Posts: 2,108

Re: Remove subjects only observed once in repeated measures analysis

Assuming you have one measure per row of data, this shell will work.

PROC SORT; BY subject;

DATA;
SET;
BY subject;
IF FIRST.subject & LAST.subject THEN DELETE;
RUN;

The if expression will only be true for subjects with just one row.

Doc Muhlbaier
Duke
N/A
Posts: 0

Re: Remove subjects only observed once in repeated measures analysis

Thank you! That is exactly what I was looking for. I really appreciate your help.

Carolyn
Super Contributor
Super Contributor
Posts: 3,174

Re: Remove subjects only observed once in repeated measures analysis

Another option to consider is the PROC SORT feature using keyword NODUPKEY and the DUPOUT= parameter.

This approach creates an output file containing those observations that have more than one unique combination of your BY statement variables -- the NODUPKEY parameter (slightly different than NODUPS which interrogates all observation variables looking for duplicate values but only for "adjacent" observations).

Scott Barry
SBBWorks, Inc.

Suggested Google advanced search argument, this topic/post:

proc sort nodupkey dupout site:sas.com
Regular Contributor
Posts: 169

Re: Remove subjects only observed once in repeated measures analysis

The methods that have been mentioned will do what you have requested. But why do you want to remove the subjects who appear only once?

It is not necessary to do so for the purposes of estimation of model parameters. It might be necessary if you believe that the missing observations for those individuals are not missing at random. However, if you believe that the missingness is unrelated to the response, then you would actually be better off leaving these individuals in your analysis.
N/A
Posts: 0

Re: Remove subjects only observed once in repeated measures analysis

Thanks for your help. I wanted to remove all of the single observations because most of them are not random, but represent plants that only lived for one year and so were not measured more than once.

Carolyn
Regular Contributor
Posts: 169

Re: Remove subjects only observed once in repeated measures analysis

Hmm, I don't know that your reasoning is valid. You are censoring plants based on some quality of their response. Thus, you do not have a situation in which the response is missing at random. I would advise against removal of the observations which have only one response. It might be OK to do a sensitivity analysis in which you look at results with and without the plants that lived for only one year. But I think your primary analysis should include all plants.
Ask a Question
Discussion stats
  • 6 replies
  • 135 views
  • 0 likes
  • 4 in conversation