Hi, It's a somewhat complicated question I'm going to ask here, but many of my analyzes depend heavily on it. Your help would be, therefore, very precious for me. In fact, I have a database with over than 700,000 observations, that seems to this: GROUP ID RANK TD YEAR SEASON GYS Value LH84 B000001 1 28MAR2011 2011 1 LH8420111 11.2 LH84 B000001 1 30MAY2011 2011 2 LH8420111 9.6 LH84 B000001 1 27JUN2011 2011 3 LH8420113 7.8 LH84 B000001 1 01AUG2011 2011 3 LH8420113 7.2 LH84 B000001 2 09FEB2012 2012 1 LH8420121 19.3 LH84 B000001 2 10APR2012 2012 2 LH8420122 20.6 LH84 B000002 1 10APR2012 2012 2 LH8420122 9.4 LH84 B000002 1 05JUN2012 2012 3 LH8420123 10.9 LH84 B000002 1 14AUG2012 2012 3 LH8420123 8.7 KC01 B000013 4 18JUN2000 2000 3 KC0120003 9.6 KC01 B000013 4 14AUG2000 2000 3 KC0120003 9.2 KC01 B000013 4 14OCT2000 2000 4 KC0120004 7.2 etc... With: - TD is a test date, - Year is the year of TD (extracted from the TD variable) - Season is the season of the TD (Months from the test date were used to create the season variable). - And, GYS is a composite variable, created from GROUP, YEAR and SEASON variables * In my raw database there are almost 1100 different GROUPS, more than 45,000 different IDs, 5 different RANK, 15 Years (from 2000 to 2014) and 4 SEASONS. * Each ID can have up to 11 different tests for a particular RANK How do I to create another new one (while keeping the same variables), which fulfills the following conditions? - Only IDs with at least 3 tests for a particular RANK are considered. - Only GROUPS that contain at least 5 different IDs in a given YEAR are considered; - For each class of GYS there are at least 4 observations. The sample I gave is very small and can not be used to test any code, but I could provide you with a larger sample if the need arises. My best thanks.
... View more