Hi everyone!
I’m trying to solve this:
What i need to do is to remove the years with less than 2 different IDs. Say for instance my data set looks like the following:
ID | Obs | Year |
1 | 12 | 2000 |
1 | 10 | 2000 |
2 | 17 | 2001 |
2 | 23 | 2001 |
3 | 35 | 2002 |
3 | 68 | 2002 |
3 | 54 | 2002 |
4 | 46 | 2000 |
4 | 61 | 2000 |
4 | 53 | 2000 |
4 | 32 | 2000 |
5 | 26 | 2001 |
6 | 14 | 2000 |
Output should be like this:
ID | Obs | Year |
1 | 12 | 2000 |
1 | 10 | 2000 |
4 | 46 | 2000 |
4 | 61 | 2000 |
4 | 53 | 2000 |
4 | 32 | 2000 |
6 | 14 | 2000 |
Any help would be greatly appreciated.
Hi,
Based on your example, would 2001 be included in your final output? Would 2002 be excluded from your final output?
data one;
input id obs year;
datalines;
1 12 2000
1 10 2000
2 17 2001
2 23 2001
3 35 2002
3 68 2002
3 54 2002
4 46 2000
4 61 2000
4 53 2000
4 32 2000
5 26 2001
6 14 2000
;
proc sql;
select * from one where year=2000;
quit;
run;
Hi Zana,
Considering the case as you have mentioned below:
fortunately the IDs in this data set can’t have different year.
Each ID will have distinctly one year only, In this scenario below code will help for you:
PROC SQL;
CREATE TABLE WANT AS
SELECT A.ID, OBS, YEAR FROM ONE AS A INNER JOIN
(SELECT ID FROM (SELECT DISTINCT ID, YEAR FROM ONE)
GROUP BY YEAR HAVING COUNT(YEAR) GT 2) AS B
ON A.ID=B.ID;
QUIT;
It is good to solve this kind problem by SQL.
proc sql;
create table want as
select *
from one
group by year
having count(distinct id) gt 2;
quit;
Xia Keshan
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.