I have unbalanced panel data as showed below, there are many years and many IDs, and I want to keep those IDs available in each year, how to achieve that?
* I realized that my sample data set is too simple, which is kind of misleading. So I attached real data set below. Again, thank you guys for your help!
That's because in our real sample {year, id} is not the primary composite key so condition "having count(id) = (select count(distinct year) from source.sample)" is not true.
Below code should deal with your real sample:
libname source "C:\test";
proc sql noprint;
create table want as
select *
from source.realsample
group by id
having count(distinct year) = (select count(distinct year) from source.realsample)
order by year, id
;
quit;
Message was edited by: Patrick Matter
If {year, id} also makes up the composite primary key in your real data then some code as below should work:
libname source "C:\test";
proc sql;
create table want as
select *
from source.sample
group by id
having count(id) = (select count(distinct year) from source.sample)
order by year, id
;
quit;
Thank you Patrick for your reply. The code works well for the simple data I attached. I don't know why it doesn't work for my real data. When I apply the code to real data, the resulting dataset contains no observation. I have attached real sample in my post.
That's because in our real sample {year, id} is not the primary composite key so condition "having count(id) = (select count(distinct year) from source.sample)" is not true.
Below code should deal with your real sample:
libname source "C:\test";
proc sql noprint;
create table want as
select *
from source.realsample
group by id
having count(distinct year) = (select count(distinct year) from source.realsample)
order by year, id
;
quit;
Message was edited by: Patrick Matter
I am not sure how to fix that problem. I tried to reorder the variables in the data set, still no observations in the result. Really appreciate your help.
See above. I've added code which should work with your "realsample" data.
That works great! Many thanks, Patrick!
proc sql;
select * from sample where id in (select id from sample group by id having count(year)>=3);
quit;
Thank you Sichen. I tried your code with real data, the result contains IDs which only have observations in one year but not the other two years, for example, id=31656 (real sample is in my post). Since Patric's code also doesn't work for my data set, now I am wondering if it is my data set's problem..
.
data want(drop=n);
set data;
by year id;
if first.year then n=1;
else n+1;
if n<4;
run;
I realized that my sample dataset is too simple, so I have updated real data set in the post. Thank you, stat@sas!
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.