Hi I have the following dataset: Data have;
input ID Date: mmddyy10. Catg_1 Catg_2 ;
datalines;
0001 03/31/2001 GLM .
0001 06/30/2001 . .
0001 12/31/2002 LGE .
0001 09/30/2003 LGE .
0001 06/30/2004 LGE .
0002 03/31/2005 . SC
0002 06/30/2005 . SC
0002 06/30/2005 EIE .
0003 09/30/2017 . .
0004 03/31/2007 . .
0004 12/31/2007 . GR
0004 06/30/2009 G GR
0004 09/30/2010 . .
;
run; What I Want is as follows: Data Want;
input ID Date: mmddyy10. Catg_1 Catg_2 ;
datalines;
0001 03/31/2001 GLM .
0001 12/31/2002 LGE .
0002 03/31/2005 . SC
0002 06/30/2005 EIE .
0003 09/30/2017 . .
0004 12/31/2007 . GR
0004 06/30/2009 G .
;
run; Irrespective of the Date, if both category variables (Catg_1 & Catg_2) are missing for an ID, one row with missing observations must be selected (i.e. ID=0003). If both category variables have an observation on the same date, same ID; Catg_1 observation (row) must be selected (i.e. ID=0004, date= 06/30/2009). Repeated observations of either of the category variables for a single ID must be chosen once only. However, if they are different, then the row must be selected when they are different than the previous one, for same ID (i.e. ID=0001) Thanks.
... View more