BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
shawnchen0321
Obsidian | Level 7

Hi, all experts.

 

I have a sample selection criterion which is data are not missing at least two consecutive years.

I have a sample below.

 

data have;
  input Panelist Year othervars;
  cards;
1          2017 1
1          2019 1
1          2020 1
2          2017 1
2          2020 1
3          2018 1
3          2020 1
;
run;

 

I want to be like the code below.

data want;
  input Panelist Year othervars;
  cards;
1          2017 1
1          2019 1
1          2020 1
3          2018 1
3          2020 1
;
run;

 

Does anyone know how to solve this problem?

Thanks in advance.

 

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
Jade | Level 19

You have to pass through each panelist twice - once to find gaps, and the second time to reread and output those with no two-year gaps:

 


data want (drop=_:);
  set have (in=firstpass)  have (in=secondpass);
  by panelist;

  _gap_found + (firstpass=1 and dif(year)>2);
  if first.panelist then _gap_found=0;

  if secondpass and _gap_found=0;
run;

 

This assumes that the data are sorted by panelist/year. 

 

Editted note: the DIF(x) function is the result of   x-LAG(x), except it doesn't generate a "missing values were generated ..." note for the first observation.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

2 REPLIES 2
mkeintz
Jade | Level 19

You have to pass through each panelist twice - once to find gaps, and the second time to reread and output those with no two-year gaps:

 


data want (drop=_:);
  set have (in=firstpass)  have (in=secondpass);
  by panelist;

  _gap_found + (firstpass=1 and dif(year)>2);
  if first.panelist then _gap_found=0;

  if secondpass and _gap_found=0;
run;

 

This assumes that the data are sorted by panelist/year. 

 

Editted note: the DIF(x) function is the result of   x-LAG(x), except it doesn't generate a "missing values were generated ..." note for the first observation.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
shawnchen0321
Obsidian | Level 7

It can work. Thanks a lot.

SAS INNOVATE 2024

Innovate_SAS_Blue.png

Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.

If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website. 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Get the $99 certification deal.jpg

 

 

Back in the Classroom!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 2 replies
  • 214 views
  • 0 likes
  • 2 in conversation