Hi everybody,
I have a form of data file like this: (over then 340 000 Obs)
ID B_DATE N_ACT S_DATE C_DATE STS VAL
BTN01 05/10/2006 1 31/12/2007 02/03/2008 P 15
BTN01 05/10/2006 1 31/12/2007 02/04/2008 M .
BTN01 05/10/2006 1 31/12/2007 27/04/2008 P 15
BTN01 05/10/2006 1 31/12/2007 02/06/2008 M .
BTN01 05/10/2006 1 31/12/2007 17/06/2008 P 12
BTN01 05/10/2006 1 31/12/2007 15/07/2008 P 10
BTN01 05/10/2006 1 31/12/2007 23/08/2008 P 9
BTN01 05/10/2006 1 31/12/2007 25/09/2008 T 0
BTN02 26/12/2005 2 09/06/2008 15/07/2008 P 21
BTN02 26/12/2005 2 09/06/2008 23/08/2008 M .
BTN02 26/12/2005 2 09/06/2008 25/09/2008 M .
BTN02 26/12/2005 2 09/06/2008 30/10/2008 P 16
BTN02 26/12/2005 2 09/06/2008 11/12/2008 P 14
BTN02 26/12/2005 2 09/06/2008 11/01/2009 P 14
BTN02 26/12/2005 2 09/06/2008 11/02/2009 T 0
BTN07 02/02/2004 2 19/09/2008 25/09/2008 P 17
BTN07 02/02/2004 2 19/09/2008 25/10/2008 P 21
BTN07 02/02/2004 2 19/09/2008 16/12/2008 P 19
BTN07 02/02/2004 2 19/09/2008 25/01/2009 M .
BTN07 02/02/2004 2 19/09/2008 22/02/2009 M .
BTN07 02/02/2004 2 19/09/2008 04/04/2009 P 13
BTN07 02/02/2004 2 19/09/2008 13/05/2009 P 12
BTN07 02/02/2004 2 19/09/2008 13/06/2009 M .
BTN07 02/02/2004 2 19/09/2008 18/07/2009 T .
.....
ID$= identifiant;
B_Date=Birth date;
ACT= Activity Number;
S_Date= Start date of the activity;
C_DATE=check date;
STS=Status;
VAL=Value;
Some conditions must be filled according to the STS value (it can be P, M, or T):
For the same activity [ACT] (can be 1, 2 or 3) and for the same individual ( over 68 000)
* If STS = "M" (between P) requires that the interval, between C_DATE, of the next observation, with "P" value, and the previous observation, with a "P" value, also, must be <52 (days). If not, all the activity will be eliminated.
* If in case of two successive M, it requires that the interval of C_DATE between, next and previous observations, with "P" values must be <68 (days), if not all of the activity, will be eliminated.
* If there are 3 or more values of M for the same individual and for the same activity, whatever their positions, all the activity will be eliminated.
Please, I need your help, even if it seems easy for most of you.
Thanks a lot.
Can you please post expected output based on your data?
Like this?
data HAVE;
input ID $ B_DATE : ddmmyy. N_ACT : S_DATE : ddmmyy. C_DATE : ddmmyy. STS : $1. VAL : ;
cards;
BTN01 05/10/2006 1 31/12/2007 02/03/2008 P 15
BTN01 05/10/2006 1 31/12/2007 02/04/2008 M .
BTN01 05/10/2006 1 31/12/2007 27/04/2008 P 15
BTN01 05/10/2006 1 31/12/2007 02/06/2008 M .
BTN01 05/10/2006 1 31/12/2007 17/06/2008 P 12
BTN01 05/10/2006 1 31/12/2007 15/07/2008 P 10
BTN01 05/10/2006 1 31/12/2007 23/08/2008 P 9
BTN01 05/10/2006 1 31/12/2007 25/09/2008 T 0
BTN02 26/12/2005 2 09/06/2008 15/07/2008 P 21
BTN02 26/12/2005 2 09/06/2008 23/08/2008 M .
BTN02 26/12/2005 2 09/06/2008 25/09/2008 M .
BTN02 26/12/2005 2 09/06/2008 30/10/2008 P 16
BTN02 26/12/2005 2 09/06/2008 11/12/2008 P 14
BTN02 26/12/2005 2 09/06/2008 11/01/2009 P 14
BTN02 26/12/2005 2 09/06/2008 11/02/2009 T 0
BTN07 02/02/2004 2 19/09/2008 25/09/2008 P 17
BTN07 02/02/2004 2 19/09/2008 25/10/2008 P 21
BTN07 02/02/2004 2 19/09/2008 16/12/2008 P 19
BTN07 02/02/2004 2 19/09/2008 25/01/2009 M .
BTN07 02/02/2004 2 19/09/2008 22/02/2009 M .
BTN07 02/02/2004 2 19/09/2008 04/04/2009 P 13
BTN07 02/02/2004 2 19/09/2008 13/05/2009 P 12
BTN07 02/02/2004 2 19/09/2008 13/06/2009 M .
BTN07 02/02/2004 2 19/09/2008 18/07/2009 T .
BTN12 26/12/2005 2 09/06/2008 25/07/2008 P 21
BTN12 26/12/2005 2 09/06/2008 23/08/2008 M .
BTN12 26/12/2005 2 09/06/2008 25/09/2008 M .
BTN12 26/12/2005 2 09/06/2008 30/09/2008 P 16
BTN12 26/12/2005 2 09/06/2008 11/12/2008 P 14
BTN12 26/12/2005 2 09/06/2008 11/01/2009 P 14
BTN12 26/12/2005 2 09/06/2008 11/02/2009 T 0
run;
data SET_DELETE_FLAG;
set HAVE(keep=ID N_ACT C_DATE STS);
by ID N_ACT C_DATE;
if first.N_ACT then call missing (NB_M,DELETE_ACT);
if not DELETE_ACT;
if STS='M' then NB_M+1;
if (lag2(ID)=ID & lag2(N_ACT)=N_ACT & lag2(STS)='P' & lag(STS)='M' & STS='P' & dif2(C_DATE) >= 52)
| (lag3(ID)=ID & lag3(N_ACT)=N_ACT & lag3(STS)='P' & lag2(STS)='M' & lag(STS)='M' & STS='P' & dif3(C_DATE) >= 68)
| NB_M >=3
then DELETE_ACT+1;
if DELETE_ACT; %* comment this line to check delete flag in intermediate table SET_DELETE_FLAG;
keep ID N_ACT DELETE_ACT ;
run;
data WANT;
merge HAVE
SET_DELETE_FLAG ;
by ID N_ACT ;
if not DELETE_ACT;
run;
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.