what procedure for saving id exist in all year

Reply
Frequent Contributor
Posts: 130

what procedure for saving id exist in all year

[ Edited ]

I have a dataset with variable: year quarter month ID.

 

year is from 1990 to 2000

quarter is 1, 2, 3, 4

month is  1,2,3,,,,,12

 

1) only save ID that appears in every year (it should have at leat two observation every year(that is, as long as it has observations for any two months in each year, then it will be saved), from 1990 to 2000);

With this condition, would expect to save obs for:  aa, bb, cc, ee

2) only save ID that appears in every quarter (it should have at leat one observation in each quarter within each year, from 1990 Q1 to 2000 Q4);

With this condition, would expect to save obs for: aa, bb, ee

3) only save ID that appears in every month (it should have at leat one observation in every month within each year,, from 1990 Jan to 2000 Dec).

With this condition, would expect to save obs for: aa, bb

 

Thanks!

 

attached is the sample data.

Attachment
Super User
Super User
Posts: 7,711

Re: what procedure for saving id exist in all year

 

 

Hi,

 

Sorry, you will need to clarify, post test data in the form of a datastep (even something like the below), and clarify the logic.

1) only save ID that appears in every year; or 

By what do you mean every year, what is the lower and upper bounds?

2) only save ID that appears in every quarter;

Every quarter, does this mean for each year, or any of the years?

3) only save ID that appears in every month

Again, is this per year, or overal?

 

At a guess at what you want:

data have;
  do id=1 to 5;
    do year=2010 to 2014;
      do month=1 to 12;
        output;
      end;
    end;
  end;
run; 

data have;
  set have;
  if id=2 and month in (3,4) then delete;
  if id=4 and year=2011 then delete;
run;

proc sort data=have out=years nodupkey;
  by id year;
run;

data years (keep=id);
  set years;
  retain lstyear yes;
  by id;
  if first.id then do;
    lstyear=year;
    yes=1;
  end;
  else if year ne lstyear+1 then yes=0;
  if last.id and yes=1 then output;
  lstyear=year;
run;

Will give you list of ID's with consecutive years.

 

Frequent Contributor
Posts: 130

Re: what procedure for saving id exist in all year

Thanks a lot! I will try your code, at the meantime I just updated the original post.

Super User
Super User
Posts: 7,711

Re: what procedure for saving id exist in all year

I would say you may be quicker off then, creating a template dataset for each grouping, e.g. one template for year, which has all the expected years (What indicates the two records though?  Put that in your template tooSmiley Happy:

data years; 

  do year=1900 to 2015;

    do sampleid=1 to 2;

      output;

    end;

  end;

run;

 

Then merging that back to the main data and keeping where a year/sampleid is missing.  

 

Frequent Contributor
Posts: 130

Re: what procedure for saving id exist in all year

thank you for your advice. I am working on the template dataset.

Super User
Posts: 11,120

Re: what procedure for saving id exist in all year

I'm not quite sure of your desired output. You repeatedly say "save" but if an ID failys the "every month" test it could pass the "every year" test.

So this looks like you want three separate data sets, is that correct?

 

What will you do with the output if so?

Frequent Contributor
Posts: 130

Re: what procedure for saving id exist in all year

Yes, three different datasets.

 

I was thinking to create a variable of year_quarter and a variable of year_month;

Then count their frequency with condition on year, on year_quarter or on year_month;

Then if the frequency counted is below the required level in one occasition, that id will be deleted. 

 

the problem is that, some IDs have no observation for entire quarter or entire year, frequency will not be counted for that missing periods. 

Ask a Question
Discussion stats
  • 6 replies
  • 383 views
  • 1 like
  • 3 in conversation