BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
shawnchen0321
Obsidian | Level 7

Hi, all experts.

 

I have a sample selection criterion which is data are not missing at least two consecutive years.

I have a sample below.

 

data have;
  input Panelist Year othervars;
  cards;
1          2017 1
1          2019 1
1          2020 1
2          2017 1
2          2020 1
3          2018 1
3          2020 1
;
run;

 

I want to be like the code below.

data want;
  input Panelist Year othervars;
  cards;
1          2017 1
1          2019 1
1          2020 1
3          2018 1
3          2020 1
;
run;

 

Does anyone know how to solve this problem?

Thanks in advance.

 

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

You have to pass through each panelist twice - once to find gaps, and the second time to reread and output those with no two-year gaps:

 


data want (drop=_:);
  set have (in=firstpass)  have (in=secondpass);
  by panelist;

  _gap_found + (firstpass=1 and dif(year)>2);
  if first.panelist then _gap_found=0;

  if secondpass and _gap_found=0;
run;

 

This assumes that the data are sorted by panelist/year. 

 

Editted note: the DIF(x) function is the result of   x-LAG(x), except it doesn't generate a "missing values were generated ..." note for the first observation.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

2 REPLIES 2
mkeintz
PROC Star

You have to pass through each panelist twice - once to find gaps, and the second time to reread and output those with no two-year gaps:

 


data want (drop=_:);
  set have (in=firstpass)  have (in=secondpass);
  by panelist;

  _gap_found + (firstpass=1 and dif(year)>2);
  if first.panelist then _gap_found=0;

  if secondpass and _gap_found=0;
run;

 

This assumes that the data are sorted by panelist/year. 

 

Editted note: the DIF(x) function is the result of   x-LAG(x), except it doesn't generate a "missing values were generated ..." note for the first observation.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
shawnchen0321
Obsidian | Level 7

It can work. Thanks a lot.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 2 replies
  • 681 views
  • 0 likes
  • 2 in conversation