Help using Base SAS procedures

How do i drop genotypes with missing data in a repeated measures experiment?

Accepted Solution Solved
Reply
New Contributor
Posts: 2
Accepted Solution

How do i drop genotypes with missing data in a repeated measures experiment?

I have a data set with 337 genotypes that were analyzed for disease resistance over an 11 year period. The first few lines looks like this:

sas help.png

However, not every genotype was tested in every year. I need to use only the genotypes that were tested in at least 5 of the 11 years. How can I create a dataset with only these genotypes? I can manually go and count them, then use the data step (I show below how I did this) to pick out the genotypes I want, but there's got to be a better way. I've been recommended to use PROC SQL but I can't work out how to tell it what I need. 

 

Suggestions? Thank you in advance! (I'm using SAS 9.4)

 

Here's the data step I mentioned:

data Potato_Blight4;
set Potato_Blight3;
if Gen = "FRASER" then output;
if Gen="I.HARDY" then output;
if Gen="IWA" then output;
if Gen="KARAKA" then output;
if Gen="KIWITEA" then output;
if Gen="P.DELL" then output;
if Gen="R.BURBANK" then output;
if Gen="R.RASCAL" then output;
if Gen="RUA" then output;
if Gen="W.DELIGHT" then output;
run;
proc print data=Potato_Blight4; run;

 

And this is the proc sql I've tried but I know it's wrong:

proc sql;
create table Potato_Blight_5yrs as
select gen
from Potato_Blight2
where y in year >5
group by gen, year
having sum(value='')=0;
/* having count(*) > 4; */
quit;


Accepted Solutions
Solution
‎04-27-2017 03:03 PM
Respected Advisor
Posts: 4,643

Re: How do i drop genotypes with missing data in a repeated measures experiment?

[ Edited ]

Keep it simple:

 

proc sql;
create table want as
select * 
from have 
group by gen
having count(distinct year) >= 5;
quit;
PG

View solution in original post


All Replies
Frequent Contributor
Posts: 93

Re: How do i drop genotypes with missing data in a repeated measures experiment?

Proc SQL YAY!

 

proc sql;
select * from have
where id in (select id from have group by id having count(distinct year) >=5);
quit;

Modified as you wish.

 

Patrick

Solution
‎04-27-2017 03:03 PM
Respected Advisor
Posts: 4,643

Re: How do i drop genotypes with missing data in a repeated measures experiment?

[ Edited ]

Keep it simple:

 

proc sql;
create table want as
select * 
from have 
group by gen
having count(distinct year) >= 5;
quit;
PG
Frequent Contributor
Posts: 93

Re: How do i drop genotypes with missing data in a repeated measures experiment?

@PGStats unfortunately the simpler code as written doesn't work.  Can you rewrite using the data step here?

 

data have;
input obs_in year gen $ rep y;
cards;
7 1983 060.1 1 9
36 1983 060.1 2 9
46 1983 060.1 3 9
71 1983 060.1 4 8
81 1983 060.1 5 9
;
run;


proc sql;
create table want as
select * 
from have 
group by gen
having count(distinct(year)) >= 5;
quit;

 

 

Respected Advisor
Posts: 4,643

Re: How do i drop genotypes with missing data in a repeated measures experiment?

Your data shows a single genotype measured on a single year (1983). So it hasn't been measured over 5 years or more and is thus not kept. Doesn't that agree with your request?

PG
Frequent Contributor
Posts: 93

Re: How do i drop genotypes with missing data in a repeated measures experiment?

LOL I guess you're right!

New Contributor
Posts: 2

Re: How do i drop genotypes with missing data in a repeated measures experiment?

Thank you so much!

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 6 replies
  • 202 views
  • 4 likes
  • 3 in conversation