Hi everybody,
I have a form of data file like this: (over then 340 000 Obs)
ID B_DATE N_ACT S_DATE C_DATE STS VAL
BTN01 05/10/2006 1 31/12/2007 02/03/2008 P 15
BTN01 05/10/2006 1 31/12/2007 02/04/2008 M .
BTN01 05/10/2006 1 31/12/2007 27/04/2008 P 15
BTN01 05/10/2006 1 31/12/2007 02/06/2008 M .
BTN01 05/10/2006 1 31/12/2007 17/06/2008 P 12
BTN01 05/10/2006 1 31/12/2007 15/07/2008 P 10
BTN01 05/10/2006 1 31/12/2007 23/08/2008 P 9
BTN01 05/10/2006 1 31/12/2007 25/09/2008 T 0
BTN02 26/12/2005 2 09/06/2008 15/07/2008 P 21
BTN02 26/12/2005 2 09/06/2008 23/08/2008 M .
BTN02 26/12/2005 2 09/06/2008 25/09/2008 M .
BTN02 26/12/2005 2 09/06/2008 30/10/2008 P 16
BTN02 26/12/2005 2 09/06/2008 11/12/2008 P 14
BTN02 26/12/2005 2 09/06/2008 11/01/2009 P 14
BTN02 26/12/2005 2 09/06/2008 11/02/2009 T 0
BTN07 02/02/2004 2 19/09/2008 25/09/2008 P 17
BTN07 02/02/2004 2 19/09/2008 25/10/2008 P 21
BTN07 02/02/2004 2 19/09/2008 16/12/2008 P 19
BTN07 02/02/2004 2 19/09/2008 25/01/2009 M .
BTN07 02/02/2004 2 19/09/2008 22/02/2009 M .
BTN07 02/02/2004 2 19/09/2008 04/04/2009 P 13
BTN07 02/02/2004 2 19/09/2008 13/05/2009 P 12
BTN07 02/02/2004 2 19/09/2008 13/06/2009 M .
BTN07 02/02/2004 2 19/09/2008 18/07/2009 T .
.....
ID$= identifiant;
B_Date=Birth date;
ACT= Activity Number;
S_Date= Start date of the activity;
C_DATE=check date;
STS=Status;
VAL=Value;
Some conditions must be filled according to the STS value (it can be P, M, or T):
For the same activity [ACT] (can be 1, 2 or 3) and for the same individual ( over 68 000)
* If STS = "M" (between P) requires that the interval, between C_DATE, of the next observation, with "P" value, and the previous observation, with a "P" value, also, must be <52 (days). If not, all the activity will be eliminated.
* If in case of two successive M, it requires that the interval of C_DATE between, next and previous observations, with "P" values must be <68 (days), if not all of the activity, will be eliminated.
* If there are 3 or more values of M for the same individual and for the same activity, whatever their positions, all the activity will be eliminated.
Please, I need your help, even if it seems easy for most of you.
Thanks a lot.
Can you please post expected output based on your data?
Like this?
data HAVE;
input ID $ B_DATE : ddmmyy. N_ACT : S_DATE : ddmmyy. C_DATE : ddmmyy. STS : $1. VAL : ;
cards;
BTN01 05/10/2006 1 31/12/2007 02/03/2008 P 15
BTN01 05/10/2006 1 31/12/2007 02/04/2008 M .
BTN01 05/10/2006 1 31/12/2007 27/04/2008 P 15
BTN01 05/10/2006 1 31/12/2007 02/06/2008 M .
BTN01 05/10/2006 1 31/12/2007 17/06/2008 P 12
BTN01 05/10/2006 1 31/12/2007 15/07/2008 P 10
BTN01 05/10/2006 1 31/12/2007 23/08/2008 P 9
BTN01 05/10/2006 1 31/12/2007 25/09/2008 T 0
BTN02 26/12/2005 2 09/06/2008 15/07/2008 P 21
BTN02 26/12/2005 2 09/06/2008 23/08/2008 M .
BTN02 26/12/2005 2 09/06/2008 25/09/2008 M .
BTN02 26/12/2005 2 09/06/2008 30/10/2008 P 16
BTN02 26/12/2005 2 09/06/2008 11/12/2008 P 14
BTN02 26/12/2005 2 09/06/2008 11/01/2009 P 14
BTN02 26/12/2005 2 09/06/2008 11/02/2009 T 0
BTN07 02/02/2004 2 19/09/2008 25/09/2008 P 17
BTN07 02/02/2004 2 19/09/2008 25/10/2008 P 21
BTN07 02/02/2004 2 19/09/2008 16/12/2008 P 19
BTN07 02/02/2004 2 19/09/2008 25/01/2009 M .
BTN07 02/02/2004 2 19/09/2008 22/02/2009 M .
BTN07 02/02/2004 2 19/09/2008 04/04/2009 P 13
BTN07 02/02/2004 2 19/09/2008 13/05/2009 P 12
BTN07 02/02/2004 2 19/09/2008 13/06/2009 M .
BTN07 02/02/2004 2 19/09/2008 18/07/2009 T .
BTN12 26/12/2005 2 09/06/2008 25/07/2008 P 21
BTN12 26/12/2005 2 09/06/2008 23/08/2008 M .
BTN12 26/12/2005 2 09/06/2008 25/09/2008 M .
BTN12 26/12/2005 2 09/06/2008 30/09/2008 P 16
BTN12 26/12/2005 2 09/06/2008 11/12/2008 P 14
BTN12 26/12/2005 2 09/06/2008 11/01/2009 P 14
BTN12 26/12/2005 2 09/06/2008 11/02/2009 T 0
run;
data SET_DELETE_FLAG;
set HAVE(keep=ID N_ACT C_DATE STS);
by ID N_ACT C_DATE;
if first.N_ACT then call missing (NB_M,DELETE_ACT);
if not DELETE_ACT;
if STS='M' then NB_M+1;
if (lag2(ID)=ID & lag2(N_ACT)=N_ACT & lag2(STS)='P' & lag(STS)='M' & STS='P' & dif2(C_DATE) >= 52)
| (lag3(ID)=ID & lag3(N_ACT)=N_ACT & lag3(STS)='P' & lag2(STS)='M' & lag(STS)='M' & STS='P' & dif3(C_DATE) >= 68)
| NB_M >=3
then DELETE_ACT+1;
if DELETE_ACT; %* comment this line to check delete flag in intermediate table SET_DELETE_FLAG;
keep ID N_ACT DELETE_ACT ;
run;
data WANT;
merge HAVE
SET_DELETE_FLAG ;
by ID N_ACT ;
if not DELETE_ACT;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.