BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Jonate_H
Quartz | Level 8

I have unbalanced panel data as showed below, there are many years and many IDs, and I want to keep those IDs available in each year, how to achieve that?

* I realized that my sample data set is too simple, which is kind of misleading. So I attached real data set below. Again, thank you guys for your help!

sample.png

1 ACCEPTED SOLUTION

Accepted Solutions
Patrick
Opal | Level 21

That's because in our real sample {year, id} is not the primary composite key so condition "having count(id) = (select count(distinct year) from source.sample)" is not true.

Below code should deal with your real sample:

libname source "C:\test";

proc sql noprint;
  create table want as
    select *
    from source.realsample
    group by id
    having count(distinct year) = (select count(distinct year) from source.realsample)
    order by year, id
    ;
quit;

Message was edited by: Patrick Matter

View solution in original post

10 REPLIES 10
Patrick
Opal | Level 21

If {year, id} also makes up the composite primary key in your real data then some code as below should work:

libname source "C:\test";

proc sql;

  create table want as

    select *

    from source.sample

    group by id

    having count(id) = (select count(distinct year) from source.sample)

    order by year, id

    ;

quit;

Jonate_H
Quartz | Level 8

Thank you Patrick for your reply. The code works well for the simple data I attached. I don't know why it doesn't work for my real data. When I apply the code to real data, the resulting dataset contains no observation. I have attached real sample in my post.

Patrick
Opal | Level 21

That's because in our real sample {year, id} is not the primary composite key so condition "having count(id) = (select count(distinct year) from source.sample)" is not true.

Below code should deal with your real sample:

libname source "C:\test";

proc sql noprint;
  create table want as
    select *
    from source.realsample
    group by id
    having count(distinct year) = (select count(distinct year) from source.realsample)
    order by year, id
    ;
quit;

Message was edited by: Patrick Matter

Jonate_H
Quartz | Level 8

I am not sure how to fix that problem. I tried to reorder the variables in the data set, still no observations in the result. Really appreciate your help.

Patrick
Opal | Level 21

See above. I've added code which should work with your "realsample" data.

Jonate_H
Quartz | Level 8

That works great!  Many thanks, Patrick!

slchen
Lapis Lazuli | Level 10

proc sql;

  select * from sample where id in (select id from sample group by id having count(year)>=3);

  quit;

Jonate_H
Quartz | Level 8

Thank you Sichen. I tried your code with real data,  the result contains IDs which only have observations in one year but not the other two years, for example, id=31656 (real sample is in my post). Since Patric's code also doesn't work for my data set, now I am wondering if it is my data set's problem..

.

stat_sas
Ammonite | Level 13

data want(drop=n);

set data;

by year id;

if first.year then n=1;

else n+1;

if n<4;

run;

Jonate_H
Quartz | Level 8

I realized that my sample dataset is too simple, so I have updated real data set in the post. Thank you, stat@sas!

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 10 replies
  • 1210 views
  • 0 likes
  • 4 in conversation