BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Jonate_H
Quartz | Level 8

I have unbalanced panel data as showed below, there are many years and many IDs, and I want to keep those IDs available in each year, how to achieve that?

* I realized that my sample data set is too simple, which is kind of misleading. So I attached real data set below. Again, thank you guys for your help!

sample.png

1 ACCEPTED SOLUTION

Accepted Solutions
Patrick
Opal | Level 21

That's because in our real sample {year, id} is not the primary composite key so condition "having count(id) = (select count(distinct year) from source.sample)" is not true.

Below code should deal with your real sample:

libname source "C:\test";

proc sql noprint;
  create table want as
    select *
    from source.realsample
    group by id
    having count(distinct year) = (select count(distinct year) from source.realsample)
    order by year, id
    ;
quit;

Message was edited by: Patrick Matter

View solution in original post

10 REPLIES 10
Patrick
Opal | Level 21

If {year, id} also makes up the composite primary key in your real data then some code as below should work:

libname source "C:\test";

proc sql;

  create table want as

    select *

    from source.sample

    group by id

    having count(id) = (select count(distinct year) from source.sample)

    order by year, id

    ;

quit;

Jonate_H
Quartz | Level 8

Thank you Patrick for your reply. The code works well for the simple data I attached. I don't know why it doesn't work for my real data. When I apply the code to real data, the resulting dataset contains no observation. I have attached real sample in my post.

Patrick
Opal | Level 21

That's because in our real sample {year, id} is not the primary composite key so condition "having count(id) = (select count(distinct year) from source.sample)" is not true.

Below code should deal with your real sample:

libname source "C:\test";

proc sql noprint;
  create table want as
    select *
    from source.realsample
    group by id
    having count(distinct year) = (select count(distinct year) from source.realsample)
    order by year, id
    ;
quit;

Message was edited by: Patrick Matter

Jonate_H
Quartz | Level 8

I am not sure how to fix that problem. I tried to reorder the variables in the data set, still no observations in the result. Really appreciate your help.

Patrick
Opal | Level 21

See above. I've added code which should work with your "realsample" data.

Jonate_H
Quartz | Level 8

That works great!  Many thanks, Patrick!

slchen
Lapis Lazuli | Level 10

proc sql;

  select * from sample where id in (select id from sample group by id having count(year)>=3);

  quit;

Jonate_H
Quartz | Level 8

Thank you Sichen. I tried your code with real data,  the result contains IDs which only have observations in one year but not the other two years, for example, id=31656 (real sample is in my post). Since Patric's code also doesn't work for my data set, now I am wondering if it is my data set's problem..

.

stat_sas
Ammonite | Level 13

data want(drop=n);

set data;

by year id;

if first.year then n=1;

else n+1;

if n<4;

run;

Jonate_H
Quartz | Level 8

I realized that my sample dataset is too simple, so I have updated real data set in the post. Thank you, stat@sas!

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 10 replies
  • 1868 views
  • 0 likes
  • 4 in conversation