Re: Keeping years that passed the critical point

zana · Posted 04-29-2014 01:15 PM

Hi everyone!

I’m trying to solve this:

I have a data set with three variables. Every ID has a year (2000 to 2002) but they have unlimited on Have data set, fortunately the IDs in this data set can’t have different year. For example: year 2000 appears 7 times on data set with three different IDs like the example below.

What i need to do is to remove the years with less than 2 different IDs. Say for instance my data set looks like the following:

ID	Obs	Year
1	12	2000
1	10	2000
2	17	2001
2	23	2001
3	35	2002
3	68	2002
3	54	2002
4	46	2000
4	61	2000
4	53	2000
4	32	2000
5	26	2001
6	14	2000

Output should be like this:

ID	Obs	Year
1	12	2000
1	10	2000
4	46	2000
4	61	2000
4	53	2000
4	32	2000
6	14	2000

Any help would be greatly appreciated.

jwillis · Posted 04-29-2014 01:49 PM

Hi,

Based on your example, would 2001 be included in your final output? Would 2002 be excluded from your final output?

sudi · Posted 04-30-2014 12:46 AM

data one;

input id obs year;

datalines;

1 12 2000

1 10 2000

2 17 2001

2 23 2001

3 35 2002

3 68 2002

3 54 2002

4 46 2000

4 61 2000

4 53 2000

4 32 2000

5 26 2001

6 14 2000

;

proc sql;

select * from one where year=2000;

quit;

run;

SKK · Posted 04-30-2014 01:27 AM

Hi Zana,

Considering the case as you have mentioned below:

 fortunately the IDs in this data set can’t have different year.

Each ID will have distinctly one year only, In this scenario below code will help for you:

PROC SQL;

CREATE TABLE WANT AS

SELECT A.ID, OBS, YEAR FROM ONE AS A INNER JOIN

(SELECT ID FROM (SELECT DISTINCT ID, YEAR FROM ONE)

GROUP BY YEAR HAVING COUNT(YEAR) GT 2) AS B

ON A.ID=B.ID;

QUIT;

Ksharp · Posted 05-01-2014 05:05 AM

It is good to solve this kind problem by SQL.

proc sql;

create table want as

select *

from one

group by year

having count(distinct id) gt 2;

quit;

Xia Keshan

Keeping years that passed the critical point