Hello everybody,
I have table A and B which have name and date variables and I want to remove observations in table A with the same name and date in table B. In addition, for the same name and date between table A and B, I want to delete the next date after it in table A.
The table A is:
data WORK.TABLEA;
infile datalines dsd truncover;
input name:$3. date:DATE9.;
format date DATE9.;
label name="name" date="date";
datalines4;
A, 7-May-08
A, 11-Jul-08
A, 3-Jan-09
A, 4-Jan-09
A, 8-Jul-09
A, 24-Jul-09
A, 8-Dec-09
B, 7-May-08
B, 10-May-08
B, 17-May-08
B, 24-May-08
B, 1-Jun-08
B, 18-Jun-08
B, 9-May-09
C, 7-Oct-09
C, 17-Oct-09
C, 25-Oct-09
C, 18-Nov-09
C, 4-Dec-09
C, 19-Dec-09
C, 9-May-10
C, 9-May-10
C, 10-May-10
;;;;
run;
And the table B is:
data WORK.TABLEB;
infile datalines dsd truncover;
input name:$3. date:DATE9.;
format date DATE9.;
label name="name" date="date";
datalines4;
A,11-Jul-08
A,3-Jan-09
A,24-Jul-09
B,7-May-08
B,17-May-08
B,18-Jun-08
B,9-Jul-09
C,17-Oct-09
C,4-Dec-09
C,19-Dec-09
;;;;
run;
So the result is:
+------+-----------+
| name | date |
+------+-----------+
| A | 7-May-08 |
| A | 8-Jul-09 |
| B | 1-Jun-08 |
| C | 7-Oct-09 |
| C | 18-Nov-09 |
| C | 10-May-10 |
+------+-----------+
How can I do that?
Thanks in advance.
That is to say you don't like to use PROC SORT ?
data WORK.TABLEA;
infile datalines dsd truncover;
input name:$3. date:DATE9.;
format date DATE9.;
label name="name" date="date";
datalines4;
A, 7-May-08
A, 11-Jul-08
A, 3-Jan-09
A, 4-Jan-09
A, 8-Jul-09
A, 24-Jul-09
A, 8-Dec-09
B, 7-May-08
B, 10-May-08
B, 17-May-08
B, 24-May-08
B, 1-Jun-08
B, 18-Jun-08
B, 9-May-09
C, 7-Oct-09
C, 17-Oct-09
C, 25-Oct-09
C, 18-Nov-09
C, 4-Dec-09
C, 19-Dec-09
C, 9-May-10
C, 9-May-10
C, 10-May-10
;;;;
run;
data WORK.TABLEB;
infile datalines dsd truncover;
input name:$3. date:DATE9.;
format date DATE9.;
label name="name" date="date";
datalines4;
A,11-Jul-08
A,3-Jan-09
A,24-Jul-09
B,7-May-08
B,17-May-08
B,18-Jun-08
B,9-Jul-09
C,17-Oct-09
C,4-Dec-09
C,19-Dec-09
;;;;
run;
data temp;
set tableb(in=inb) tablea;
by name date;
group+first.date;
_inb=inb;
run;
data key;
set temp(where=(_inb=1));
output;
group=group+1;
output;
keep name group;
run;
proc sql;
select *
from temp
where catx(' ',name,group) not in
(select catx(' ',name,group) from key);
quit;
Something like:
proc sql; create table WANT as select A.* from TABLEA A left join TABLEB B on A.NAME=B.NAME and (A.DATE=B.DATE or A.DATE=B.DATE+1) where B.NAME=""; quit;
But I am afraid your required output doesn't seem to work, why does 08Dec for A not come out, it should, there is no 1 day around it in B?
"In addition, for the same name and date between table A and B, I want to delete the next date after it in table A." @aminkarimid, do you mean the next date or the next observation?
data WORK.TABLEA;
infile datalines dsd truncover;
input name:$3. date:DATE9.;
format date DATE9.;
label name="name" date="date";
datalines4;
A, 7-May-08
A, 11-Jul-08
A, 3-Jan-09
A, 4-Jan-09
A, 8-Jul-09
A, 24-Jul-09
A, 8-Dec-09
B, 7-May-08
B, 10-May-08
B, 17-May-08
B, 24-May-08
B, 1-Jun-08
B, 18-Jun-08
B, 9-May-09
C, 7-Oct-09
C, 17-Oct-09
C, 25-Oct-09
C, 18-Nov-09
C, 4-Dec-09
C, 19-Dec-09
C, 9-May-10
C, 9-May-10
C, 10-May-10
;;;;
run;
data WORK.TABLEB;
infile datalines dsd truncover;
input name:$3. date:DATE9.;
format date DATE9.;
label name="name" date="date";
datalines4;
A,11-Jul-08
A,3-Jan-09
A,24-Jul-09
B,7-May-08
B,17-May-08
B,18-Jun-08
B,9-Jul-09
C,17-Oct-09
C,4-Dec-09
C,19-Dec-09
;;;;
run;
proc sort data=tablea out=tablea1 nodupkey;
by name date;
run;
proc sort data=tableb out=tableb1 nodupkey;
by name date;
run;
data want;
merge tablea1 tableb1(in=inb) ;
by name date;
if inb or (inb=0 and lag(inb)=1 and not first.name) then delete;;
run;
Thanks @Ksharp;
Now, I want to go one step further. If table A has another variable like time which is shown below:
data WORK.TABLEA;
infile datalines dsd truncover;
input name:$3. date:DATE9. time:TIME8.;
format date DATE9. time TIME8.;
label name="name" date="date" time="time";
datalines4;
A, 7-May-08, 11:12:41
A, 11-Jul-08, 11:23:41
A, 3-Jan-09, 11:31:41
A, 4-Jan-09, 11:32:41
A, 8-Jul-09, 11:32:41
A, 24-Jul-09, 11:32:41
A, 8-Dec-09, 12:32:41
B, 7-May-08, 11:31:41
B, 10-May-08, 11:32:41
B, 17-May-08, 11:33:41
B, 24-May-08, 11:34:41
B, 1-Jun-08, 11:35:41
B, 18-Jun-08, 11:36:41
B, 9-May-09, 11:37:41
C, 7-Oct-09, 11:21:41
C, 17-Oct-09, 11:22:41
C, 25-Oct-09, 11:32:41
C, 18-Nov-09, 11:33:41
C, 4-Dec-09, 11:12:41
C, 19-Dec-09, 10:22:41
C, 9-May-10, 11:42:41
C, 9-May-10, 11:12:41
C, 10-May-10, 12:52:41
;;;;
run;
Using the nodupkey option is not a good solution to delete same observations. because there are maybe another variables like price, turnover, etc or the duplicate observations is important and must be on a table. So your last part of your code needs to recognize observations with the same date which should be deleted and then delete them.
That is to say you don't like to use PROC SORT ?
data WORK.TABLEA;
infile datalines dsd truncover;
input name:$3. date:DATE9.;
format date DATE9.;
label name="name" date="date";
datalines4;
A, 7-May-08
A, 11-Jul-08
A, 3-Jan-09
A, 4-Jan-09
A, 8-Jul-09
A, 24-Jul-09
A, 8-Dec-09
B, 7-May-08
B, 10-May-08
B, 17-May-08
B, 24-May-08
B, 1-Jun-08
B, 18-Jun-08
B, 9-May-09
C, 7-Oct-09
C, 17-Oct-09
C, 25-Oct-09
C, 18-Nov-09
C, 4-Dec-09
C, 19-Dec-09
C, 9-May-10
C, 9-May-10
C, 10-May-10
;;;;
run;
data WORK.TABLEB;
infile datalines dsd truncover;
input name:$3. date:DATE9.;
format date DATE9.;
label name="name" date="date";
datalines4;
A,11-Jul-08
A,3-Jan-09
A,24-Jul-09
B,7-May-08
B,17-May-08
B,18-Jun-08
B,9-Jul-09
C,17-Oct-09
C,4-Dec-09
C,19-Dec-09
;;;;
run;
data temp;
set tableb(in=inb) tablea;
by name date;
group+first.date;
_inb=inb;
run;
data key;
set temp(where=(_inb=1));
output;
group=group+1;
output;
keep name group;
run;
proc sql;
select *
from temp
where catx(' ',name,group) not in
(select catx(' ',name,group) from key);
quit;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.