BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
jchitw12
Fluorite | Level 6

I have a data set with 337 genotypes that were analyzed for disease resistance over an 11 year period. The first few lines looks like this:

sas help.png

However, not every genotype was tested in every year. I need to use only the genotypes that were tested in at least 5 of the 11 years. How can I create a dataset with only these genotypes? I can manually go and count them, then use the data step (I show below how I did this) to pick out the genotypes I want, but there's got to be a better way. I've been recommended to use PROC SQL but I can't work out how to tell it what I need. 

 

Suggestions? Thank you in advance! (I'm using SAS 9.4)

 

Here's the data step I mentioned:

data Potato_Blight4;
set Potato_Blight3;
if Gen = "FRASER" then output;
if Gen="I.HARDY" then output;
if Gen="IWA" then output;
if Gen="KARAKA" then output;
if Gen="KIWITEA" then output;
if Gen="P.DELL" then output;
if Gen="R.BURBANK" then output;
if Gen="R.RASCAL" then output;
if Gen="RUA" then output;
if Gen="W.DELIGHT" then output;
run;
proc print data=Potato_Blight4; run;

 

And this is the proc sql I've tried but I know it's wrong:

proc sql;
create table Potato_Blight_5yrs as
select gen
from Potato_Blight2
where y in year >5
group by gen, year
having sum(value='')=0;
/* having count(*) > 4; */
quit;

1 ACCEPTED SOLUTION

Accepted Solutions
PGStats
Opal | Level 21

Keep it simple:

 

proc sql;
create table want as
select * 
from have 
group by gen
having count(distinct year) >= 5;
quit;
PG

View solution in original post

6 REPLIES 6
thomp7050
Pyrite | Level 9

Proc SQL YAY!

 

proc sql;
select * from have
where id in (select id from have group by id having count(distinct year) >=5);
quit;

Modified as you wish.

 

Patrick

PGStats
Opal | Level 21

Keep it simple:

 

proc sql;
create table want as
select * 
from have 
group by gen
having count(distinct year) >= 5;
quit;
PG
thomp7050
Pyrite | Level 9

@PGStats unfortunately the simpler code as written doesn't work.  Can you rewrite using the data step here?

 

data have;
input obs_in year gen $ rep y;
cards;
7 1983 060.1 1 9
36 1983 060.1 2 9
46 1983 060.1 3 9
71 1983 060.1 4 8
81 1983 060.1 5 9
;
run;


proc sql;
create table want as
select * 
from have 
group by gen
having count(distinct(year)) >= 5;
quit;

 

 

PGStats
Opal | Level 21

Your data shows a single genotype measured on a single year (1983). So it hasn't been measured over 5 years or more and is thus not kept. Doesn't that agree with your request?

PG

sas-innovate-wordmark-2025-midnight.png

Register Today!

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.


Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 1463 views
  • 4 likes
  • 3 in conversation