BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
jchitw12
Fluorite | Level 6

I have a data set with 337 genotypes that were analyzed for disease resistance over an 11 year period. The first few lines looks like this:

sas help.png

However, not every genotype was tested in every year. I need to use only the genotypes that were tested in at least 5 of the 11 years. How can I create a dataset with only these genotypes? I can manually go and count them, then use the data step (I show below how I did this) to pick out the genotypes I want, but there's got to be a better way. I've been recommended to use PROC SQL but I can't work out how to tell it what I need. 

 

Suggestions? Thank you in advance! (I'm using SAS 9.4)

 

Here's the data step I mentioned:

data Potato_Blight4;
set Potato_Blight3;
if Gen = "FRASER" then output;
if Gen="I.HARDY" then output;
if Gen="IWA" then output;
if Gen="KARAKA" then output;
if Gen="KIWITEA" then output;
if Gen="P.DELL" then output;
if Gen="R.BURBANK" then output;
if Gen="R.RASCAL" then output;
if Gen="RUA" then output;
if Gen="W.DELIGHT" then output;
run;
proc print data=Potato_Blight4; run;

 

And this is the proc sql I've tried but I know it's wrong:

proc sql;
create table Potato_Blight_5yrs as
select gen
from Potato_Blight2
where y in year >5
group by gen, year
having sum(value='')=0;
/* having count(*) > 4; */
quit;

1 ACCEPTED SOLUTION

Accepted Solutions
PGStats
Opal | Level 21

Keep it simple:

 

proc sql;
create table want as
select * 
from have 
group by gen
having count(distinct year) >= 5;
quit;
PG

View solution in original post

6 REPLIES 6
thomp7050
Pyrite | Level 9

Proc SQL YAY!

 

proc sql;
select * from have
where id in (select id from have group by id having count(distinct year) >=5);
quit;

Modified as you wish.

 

Patrick

PGStats
Opal | Level 21

Keep it simple:

 

proc sql;
create table want as
select * 
from have 
group by gen
having count(distinct year) >= 5;
quit;
PG
thomp7050
Pyrite | Level 9

@PGStats unfortunately the simpler code as written doesn't work.  Can you rewrite using the data step here?

 

data have;
input obs_in year gen $ rep y;
cards;
7 1983 060.1 1 9
36 1983 060.1 2 9
46 1983 060.1 3 9
71 1983 060.1 4 8
81 1983 060.1 5 9
;
run;


proc sql;
create table want as
select * 
from have 
group by gen
having count(distinct(year)) >= 5;
quit;

 

 

PGStats
Opal | Level 21

Your data shows a single genotype measured on a single year (1983). So it hasn't been measured over 5 years or more and is thus not kept. Doesn't that agree with your request?

PG

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 990 views
  • 4 likes
  • 3 in conversation