BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
shawnchen0321
Obsidian | Level 7

Hi, all experts.

 

I have a sample selection criterion which is data are not missing at least two consecutive years.

I have a sample below.

 

data have;
  input Panelist Year othervars;
  cards;
1          2017 1
1          2019 1
1          2020 1
2          2017 1
2          2020 1
3          2018 1
3          2020 1
;
run;

 

I want to be like the code below.

data want;
  input Panelist Year othervars;
  cards;
1          2017 1
1          2019 1
1          2020 1
3          2018 1
3          2020 1
;
run;

 

Does anyone know how to solve this problem?

Thanks in advance.

 

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

You have to pass through each panelist twice - once to find gaps, and the second time to reread and output those with no two-year gaps:

 


data want (drop=_:);
  set have (in=firstpass)  have (in=secondpass);
  by panelist;

  _gap_found + (firstpass=1 and dif(year)>2);
  if first.panelist then _gap_found=0;

  if secondpass and _gap_found=0;
run;

 

This assumes that the data are sorted by panelist/year. 

 

Editted note: the DIF(x) function is the result of   x-LAG(x), except it doesn't generate a "missing values were generated ..." note for the first observation.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

2 REPLIES 2
mkeintz
PROC Star

You have to pass through each panelist twice - once to find gaps, and the second time to reread and output those with no two-year gaps:

 


data want (drop=_:);
  set have (in=firstpass)  have (in=secondpass);
  by panelist;

  _gap_found + (firstpass=1 and dif(year)>2);
  if first.panelist then _gap_found=0;

  if secondpass and _gap_found=0;
run;

 

This assumes that the data are sorted by panelist/year. 

 

Editted note: the DIF(x) function is the result of   x-LAG(x), except it doesn't generate a "missing values were generated ..." note for the first observation.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
shawnchen0321
Obsidian | Level 7

It can work. Thanks a lot.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 2 replies
  • 405 views
  • 0 likes
  • 2 in conversation