Solved: Re: How to keep IDs available in each year

Jonate_H · Posted 05-22-2014 06:21 PM

I have unbalanced panel data as showed below, there are many years and many IDs, and I want to keep those IDs available in each year, how to achieve that?

* I realized that my sample data set is too simple, which is kind of misleading. So I attached real data set below. Again, thank you guys for your help!

Patrick · Posted 05-23-2014 12:48 AM

That's because in our real sample {year, id} is not the primary composite key so condition "having count(id) = (select count(distinct year) from source.sample)" is not true.

Below code should deal with your real sample:

libname source "C:\test";

proc sql noprint;
create table want as
    select *
    from source.realsample
    group by id
    having count(distinct year) = (select count(distinct year) from source.realsample)
    order by year, id
    ;
quit;

Message was edited by: Patrick Matter

View solution in original post

Patrick · Posted 05-22-2014 06:59 PM

If {year, id} also makes up the composite primary key in your real data then some code as below should work:

libname source "C:\test";

proc sql;

create table want as

select *

from source.sample

group by id

having count(id) = (select count(distinct year) from source.sample)

order by year, id

;

quit;

Jonate_H · Posted 05-23-2014 12:35 AM

Thank you Patrick for your reply. The code works well for the simple data I attached. I don't know why it doesn't work for my real data. When I apply the code to real data, the resulting dataset contains no observation. I have attached real sample in my post.

Patrick · Posted 05-23-2014 12:48 AM

That's because in our real sample {year, id} is not the primary composite key so condition "having count(id) = (select count(distinct year) from source.sample)" is not true.

Below code should deal with your real sample:

libname source "C:\test";

proc sql noprint;
create table want as
    select *
    from source.realsample
    group by id
    having count(distinct year) = (select count(distinct year) from source.realsample)
    order by year, id
    ;
quit;

Message was edited by: Patrick Matter

Jonate_H · Posted 05-23-2014 01:06 AM

I am not sure how to fix that problem. I tried to reorder the variables in the data set, still no observations in the result. Really appreciate your help.

Patrick · Posted 05-23-2014 01:16 AM

See above. I've added code which should work with your "realsample" data.

Jonate_H · Posted 05-23-2014 07:59 PM

That works great! Many thanks, Patrick!

slchen · Posted 05-22-2014 09:53 PM

proc sql;

select * from sample where id in (select id from sample group by id having count(year)>=3);

quit;

Jonate_H · Posted 05-23-2014 12:44 AM

Thank you Sichen. I tried your code with real data, the result contains IDs which only have observations in one year but not the other two years, for example, id=31656 (real sample is in my post). Since Patric's code also doesn't work for my data set, now I am wondering if it is my data set's problem..

.

stat_sas · Posted 05-22-2014 10:03 PM

data want(drop=n);

set data;

by year id;

if first.year then n=1;

else n+1;

if n<4;

run;

Jonate_H · Posted 05-23-2014 12:50 AM

I realized that my sample dataset is too simple, so I have updated real data set in the post. Thank you, stat@sas!

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away