03-02-2016 09:04 AM - edited 03-02-2016 11:12 AM
I have a dataset with variable: year quarter month ID.
year is from 1990 to 2000
quarter is 1, 2, 3, 4
month is 1,2,3,,,,,12
1) only save ID that appears in every year (it should have at leat two observation every year(that is, as long as it has observations for any two months in each year, then it will be saved), from 1990 to 2000);
With this condition, would expect to save obs for: aa, bb, cc, ee
2) only save ID that appears in every quarter (it should have at leat one observation in each quarter within each year, from 1990 Q1 to 2000 Q4);
With this condition, would expect to save obs for: aa, bb, ee
3) only save ID that appears in every month (it should have at leat one observation in every month within each year,, from 1990 Jan to 2000 Dec).
With this condition, would expect to save obs for: aa, bb
attached is the sample data.
03-02-2016 09:37 AM
Sorry, you will need to clarify, post test data in the form of a datastep (even something like the below), and clarify the logic.
1) only save ID that appears in every year; or
By what do you mean every year, what is the lower and upper bounds?
2) only save ID that appears in every quarter;
Every quarter, does this mean for each year, or any of the years?
3) only save ID that appears in every month
Again, is this per year, or overal?
At a guess at what you want:
data have; do id=1 to 5; do year=2010 to 2014; do month=1 to 12; output; end; end; end; run; data have; set have; if id=2 and month in (3,4) then delete; if id=4 and year=2011 then delete; run; proc sort data=have out=years nodupkey; by id year; run; data years (keep=id); set years; retain lstyear yes; by id; if first.id then do; lstyear=year; yes=1; end; else if year ne lstyear+1 then yes=0; if last.id and yes=1 then output; lstyear=year; run;
Will give you list of ID's with consecutive years.
03-02-2016 10:27 AM
I would say you may be quicker off then, creating a template dataset for each grouping, e.g. one template for year, which has all the expected years (What indicates the two records though? Put that in your template too:
do year=1900 to 2015;
do sampleid=1 to 2;
Then merging that back to the main data and keeping where a year/sampleid is missing.
03-02-2016 10:47 AM
I'm not quite sure of your desired output. You repeatedly say "save" but if an ID failys the "every month" test it could pass the "every year" test.
So this looks like you want three separate data sets, is that correct?
What will you do with the output if so?
03-02-2016 11:25 AM
Yes, three different datasets.
I was thinking to create a variable of year_quarter and a variable of year_month;
Then count their frequency with condition on year, on year_quarter or on year_month;
Then if the frequency counted is below the required level in one occasition, that id will be deleted.
the problem is that, some IDs have no observation for entire quarter or entire year, frequency will not be counted for that missing periods.