Hello all,
I am wondering if it is possible to select observations from a by group containing a specific set of dates. Take the following data for example...
ID Date
1 2016
1 2017
1 2018
2 2016
2 2017
3 2016
3 2017
3 2018
Based on the data above, I want outputs where IDs can be matched to 2016, 2017, AND 2018 (See example output below)
ID Date
1 2016
1 2017
1 2018
3 2016
3 2017
3 2018
Any help would be greatly appreciated. Thank you.
P.S. I am using SAS EG version 7.1.
data have;
input ID Date;
cards;
1 2016
1 2017
1 2018
2 2016
2 2017
3 2016
3 2017
3 2018
;
proc sql;
create table want as
select *
from have
where id in (select id from have where date in (2016:2018) group by id having count(distinct date)=3);
quit;
data have;
input ID Date;
cards;
1 2016
1 2017
1 2018
2 2016
2 2017
3 2016
3 2017
3 2018
;
proc sql;
create table want as
select *
from have
where id in (select id from have where date in (2016:2018) group by id having count(distinct date)=3);
quit;
Thank you much!!
One additional question... say that I had duplicate values of some dates (e.g., ID:1 has 2 observations where the date is 2016. Is there a way to modify the code such that it works for these instances as well keeping duplicate values?
Hi @kb179208 Assuming what you mean is the following modified sample
data have;
input ID Date;
cards;
1 2016
1 2016
1 2017
1 2017
1 2018
1 2018
2 2016
2 2017
3 2016
3 2017
3 2018
;
The code doesn't exclude the dups in the final result. Is my understanding correct?
yes that is correct
It's still the same code. Test and let me know
You are right, the code works for the example I provided so I should have been more clear. I have some IDs that may have 3 observations, but each with a date of 2016. I want to exclude these as they do not also contain date values of 2017 and 2018.
Basically the Idea to build the algorithm is simple:
PART1
1. Fiiter by subsetting ID's with dates falling in the range 2016:2018
2. 1 will have some ID's with all 3 or just 1 or 2 in the range . Example ID A may have 2016 10 times and only 2016. ID B may have all three occuring any number of times , ID C with 2017 and 2018 occuring any number of times
3. Within the subset, we do a "distinct count," i.e if the distinct count is equal 3 that means that ID must have all three 2016-2018 .
4. So this makes ID B our our target
PART2
5. Read all records from HAVE, the outer query and subset only those ID's that resulted from sub query(PART A)
I hope this helps you grasp the solution 🙂
I think I know what my problem is. I reformatted the DATE (mmddyyy) variable into a different variable DATE2, where it only shows the year. However, since I set DATE2=DATE it is still picking up the values as distinct.
Thank you once again for all the help!! I am not the best with SQL queries in sas.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.
Find more tutorials on the SAS Users YouTube channel.