Solved: Re: How to eliminate a day and a day after it based on date observatio...

aminkarimid · Posted 11-27-2017 02:44 AM

Hello everybody,

I have table A and B which have name and date variables and I want to remove observations in table A with the same name and date in table B. In addition, for the same name and date between table A and B, I want to delete the next date after it in table A.

The table A is:

data WORK.TABLEA;
infile datalines dsd truncover;
input name:$3. date:DATE9.;
format date DATE9.;
label name="name" date="date";
datalines4;
A, 7-May-08
A, 11-Jul-08
A, 3-Jan-09
A, 4-Jan-09
A, 8-Jul-09
A, 24-Jul-09
A, 8-Dec-09
B, 7-May-08
B, 10-May-08
B, 17-May-08
B, 24-May-08
B, 1-Jun-08
B, 18-Jun-08
B, 9-May-09
C, 7-Oct-09
C, 17-Oct-09
C, 25-Oct-09
C, 18-Nov-09
C, 4-Dec-09
C, 19-Dec-09
C, 9-May-10
C, 9-May-10
C, 10-May-10
;;;;
run;

And the table B is:

data WORK.TABLEB;
infile datalines dsd truncover;
input name:$3. date:DATE9.;
format date DATE9.;
label name="name" date="date";
datalines4;
A,11-Jul-08
A,3-Jan-09
A,24-Jul-09
B,7-May-08
B,17-May-08
B,18-Jun-08
B,9-Jul-09
C,17-Oct-09
C,4-Dec-09
C,19-Dec-09
;;;;
run;

So the result is:

+------+-----------+
| name |   date    |
+------+-----------+
| A    | 7-May-08  |
| A    | 8-Jul-09  |
| B    | 1-Jun-08  |
| C    | 7-Oct-09  |
| C    | 18-Nov-09 |
| C    | 10-May-10 |
+------+-----------+

How can I do that?

Thanks in advance.

Ksharp · Posted 11-28-2017 08:04 AM

That is to say you don't like to use PROC SORT ?

data WORK.TABLEA;
infile datalines dsd truncover;
input name:$3. date:DATE9.;
format date DATE9.;
label name="name" date="date";
datalines4;
A, 7-May-08
A, 11-Jul-08
A, 3-Jan-09
A, 4-Jan-09
A, 8-Jul-09
A, 24-Jul-09
A, 8-Dec-09
B, 7-May-08
B, 10-May-08
B, 17-May-08
B, 24-May-08
B, 1-Jun-08
B, 18-Jun-08
B, 9-May-09
C, 7-Oct-09
C, 17-Oct-09
C, 25-Oct-09
C, 18-Nov-09
C, 4-Dec-09
C, 19-Dec-09
C, 9-May-10
C, 9-May-10
C, 10-May-10
;;;;
run;

data WORK.TABLEB;
infile datalines dsd truncover;
input name:$3. date:DATE9.;
format date DATE9.;
label name="name" date="date";
datalines4;
A,11-Jul-08
A,3-Jan-09
A,24-Jul-09
B,7-May-08
B,17-May-08
B,18-Jun-08
B,9-Jul-09
C,17-Oct-09
C,4-Dec-09
C,19-Dec-09
;;;;
run;
data temp;
 set tableb(in=inb) tablea;
 by name date;
 group+first.date;
 _inb=inb;
run;
data key;
 set temp(where=(_inb=1));
 output;
 group=group+1;
 output;
 keep name group;
run;
proc sql;
select *
 from temp 
  where catx(' ',name,group) not in
   (select catx(' ',name,group) from key);
quit;

View solution in original post

RW9 · Posted 11-27-2017 03:57 AM

Something like:

proc sql;
  create table WANT as 
  select A.*
  from   TABLEA A
  left join TABLEB B
  on       A.NAME=B.NAME
  and     (A.DATE=B.DATE or A.DATE=B.DATE+1)
  where  B.NAME="";
quit;

But I am afraid your required output doesn't seem to work, why does 08Dec for A not come out, it should, there is no 1 day around it in B?

PeterClemmensen · Posted 11-27-2017 04:19 AM

"In addition, for the same name and date between table A and B, I want to delete the next date after it in table A." @aminkarimid, do you mean the next date or the next observation?

The DATA to DATA Step Macro
Blog: SASnrd

aminkarimid · Posted 11-27-2017 08:01 AM

Thanks @PeterClemmensen
I mean name and date observations for next date in table A.

Ksharp · Posted 11-27-2017 08:13 AM

data WORK.TABLEA;
infile datalines dsd truncover;
input name:$3. date:DATE9.;
format date DATE9.;
label name="name" date="date";
datalines4;
A, 7-May-08
A, 11-Jul-08
A, 3-Jan-09
A, 4-Jan-09
A, 8-Jul-09
A, 24-Jul-09
A, 8-Dec-09
B, 7-May-08
B, 10-May-08
B, 17-May-08
B, 24-May-08
B, 1-Jun-08
B, 18-Jun-08
B, 9-May-09
C, 7-Oct-09
C, 17-Oct-09
C, 25-Oct-09
C, 18-Nov-09
C, 4-Dec-09
C, 19-Dec-09
C, 9-May-10
C, 9-May-10
C, 10-May-10
;;;;
run;

data WORK.TABLEB;
infile datalines dsd truncover;
input name:$3. date:DATE9.;
format date DATE9.;
label name="name" date="date";
datalines4;
A,11-Jul-08
A,3-Jan-09
A,24-Jul-09
B,7-May-08
B,17-May-08
B,18-Jun-08
B,9-Jul-09
C,17-Oct-09
C,4-Dec-09
C,19-Dec-09
;;;;
run;
proc sort data=tablea out=tablea1 nodupkey;
by name date;
run;
proc sort data=tableb out=tableb1 nodupkey;
by name date;
run;
data want;
 merge  tablea1 tableb1(in=inb) ;
 by name date;
 if inb or (inb=0 and lag(inb)=1 and not first.name) then delete;;
run;

aminkarimid · Posted 11-28-2017 03:11 AM

Thanks @Ksharp;

Now, I want to go one step further. If table A has another variable like time which is shown below:

data WORK.TABLEA;
infile datalines dsd truncover;
input name:$3. date:DATE9. time:TIME8.;
format date DATE9. time TIME8.;
label name="name" date="date" time="time";
datalines4;
A, 7-May-08, 11:12:41
A, 11-Jul-08, 11:23:41
A, 3-Jan-09, 11:31:41
A, 4-Jan-09, 11:32:41
A, 8-Jul-09, 11:32:41
A, 24-Jul-09, 11:32:41
A, 8-Dec-09, 12:32:41
B, 7-May-08, 11:31:41
B, 10-May-08, 11:32:41
B, 17-May-08, 11:33:41
B, 24-May-08, 11:34:41
B, 1-Jun-08, 11:35:41
B, 18-Jun-08, 11:36:41
B, 9-May-09, 11:37:41
C, 7-Oct-09, 11:21:41
C, 17-Oct-09, 11:22:41
C, 25-Oct-09, 11:32:41
C, 18-Nov-09, 11:33:41
C, 4-Dec-09, 11:12:41
C, 19-Dec-09, 10:22:41
C, 9-May-10, 11:42:41
C, 9-May-10, 11:12:41
C, 10-May-10, 12:52:41
;;;;
run;

Using the nodupkey option is not a good solution to delete same observations. because there are maybe another variables like price, turnover, etc or the duplicate observations is important and must be on a table. So your last part of your code needs to recognize observations with the same date which should be deleted and then delete them.

Ksharp · Posted 11-28-2017 08:04 AM

That is to say you don't like to use PROC SORT ?

data WORK.TABLEA;
infile datalines dsd truncover;
input name:$3. date:DATE9.;
format date DATE9.;
label name="name" date="date";
datalines4;
A, 7-May-08
A, 11-Jul-08
A, 3-Jan-09
A, 4-Jan-09
A, 8-Jul-09
A, 24-Jul-09
A, 8-Dec-09
B, 7-May-08
B, 10-May-08
B, 17-May-08
B, 24-May-08
B, 1-Jun-08
B, 18-Jun-08
B, 9-May-09
C, 7-Oct-09
C, 17-Oct-09
C, 25-Oct-09
C, 18-Nov-09
C, 4-Dec-09
C, 19-Dec-09
C, 9-May-10
C, 9-May-10
C, 10-May-10
;;;;
run;

data WORK.TABLEB;
infile datalines dsd truncover;
input name:$3. date:DATE9.;
format date DATE9.;
label name="name" date="date";
datalines4;
A,11-Jul-08
A,3-Jan-09
A,24-Jul-09
B,7-May-08
B,17-May-08
B,18-Jun-08
B,9-Jul-09
C,17-Oct-09
C,4-Dec-09
C,19-Dec-09
;;;;
run;
data temp;
 set tableb(in=inb) tablea;
 by name date;
 group+first.date;
 _inb=inb;
run;
data key;
 set temp(where=(_inb=1));
 output;
 group=group+1;
 output;
 keep name group;
run;
proc sql;
select *
 from temp 
  where catx(' ',name,group) not in
   (select catx(' ',name,group) from key);
quit;

How to eliminate a day and a day after it based on date observations in another table?

Re: How to eliminate a day and a day after it based on date observations in another table?

Re: How to eliminate a day and a day after it based on date observations in another table?

Re: How to eliminate a day and a day after it based on date observations in another table?

Re: How to eliminate a day and a day after it based on date observations in another table?

Re: How to eliminate a day and a day after it based on date observations in another table?

Re: How to eliminate a day and a day after it based on date observations in another table?

Re: How to eliminate a day and a day after it based on date observations in another table?

Catch up on SAS Innovate 2026

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away