BookmarkSubscribeRSS Feed
deleted_user
Not applicable
Hello all,

I'm working with a large dataset (5,000+ subjects) that were measured anywhere between one and nine times. I'm using repeated measures in proc mixed to analyze my data. My variables are subject, year measured, and size. I have a lot of subjects, however, that were only measured in one year. Does anyone know how to remove subjects that only appear once in the dataset?

thanks for your help!
Carolyn
6 REPLIES 6
Doc_Duke
Rhodochrosite | Level 12
Assuming you have one measure per row of data, this shell will work.

PROC SORT; BY subject;

DATA;
SET;
BY subject;
IF FIRST.subject & LAST.subject THEN DELETE;
RUN;

The if expression will only be true for subjects with just one row.

Doc Muhlbaier
Duke
deleted_user
Not applicable
Thank you! That is exactly what I was looking for. I really appreciate your help.

Carolyn
sbb
Lapis Lazuli | Level 10 sbb
Lapis Lazuli | Level 10
Another option to consider is the PROC SORT feature using keyword NODUPKEY and the DUPOUT= parameter.

This approach creates an output file containing those observations that have more than one unique combination of your BY statement variables -- the NODUPKEY parameter (slightly different than NODUPS which interrogates all observation variables looking for duplicate values but only for "adjacent" observations).

Scott Barry
SBBWorks, Inc.

Suggested Google advanced search argument, this topic/post:

proc sort nodupkey dupout site:sas.com
Dale
Pyrite | Level 9
The methods that have been mentioned will do what you have requested. But why do you want to remove the subjects who appear only once?

It is not necessary to do so for the purposes of estimation of model parameters. It might be necessary if you believe that the missing observations for those individuals are not missing at random. However, if you believe that the missingness is unrelated to the response, then you would actually be better off leaving these individuals in your analysis.
deleted_user
Not applicable
Thanks for your help. I wanted to remove all of the single observations because most of them are not random, but represent plants that only lived for one year and so were not measured more than once.

Carolyn
Dale
Pyrite | Level 9
Hmm, I don't know that your reasoning is valid. You are censoring plants based on some quality of their response. Thus, you do not have a situation in which the response is missing at random. I would advise against removal of the observations which have only one response. It might be OK to do a sensitivity analysis in which you look at results with and without the plants that lived for only one year. But I think your primary analysis should include all plants.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 1789 views
  • 0 likes
  • 4 in conversation