Re: SAS Merge Process

deleted_user · Posted 03-27-2009 02:28 AM

Hi All,

I have a bit of a difficult problem that I thought someone else might be able to suggest some possible alternatives and post some code that might work for me.

Some background, I have 2 datasets. "A" is a list of purchases, "B" is a list of credits for purchases (ie product returns).

What I would like to be able to do is remove items from dataset "A" where it has a matching record in dataset "B".

Dataset A
Cust Prod Cost
ABC1 123X $57
ABC1 998C $107
ABC1 998C $107
ABC1 567D $500

Dataset B
Cust Prod Cost
ABC1 998C $107

Resulting Dataset
Cust Prod Cost
ABC1 123X $57
ABC1 998C $107
ABC1 567D $500

As you can see there are 2 records that could have been matched between "A" and "B", however only one was removed because there was only one record in "B".

If anyone has some thoughts how this can be approached in SAS 9.1 that would be greatly appreciated.

Thanks in advance
DW Changed the subject to indicate that this had been solved.

Message was edited by: DWW

andreas_lds · Posted 03-27-2009 03:46 AM

Hi DWW,

try using the IN-option when merging the dataset. Both datasets have to be sorted by Cust, Prod and Cost.

Example
[pre]
data work.c;
merge work.a
work.b(in= inB)
;
by Cust Prod Cost;

if inB then delete;
run;
[/pre]

ieva · Posted 03-27-2009 05:44 AM

Hi,

This code could help, but only if in the dataset B there are no repeated observations.

proc sort data=a ;
by cust prod cost;
run;

proc sort data=b;
by cust prod cost;
run;

data c;
merge a b (in=b);
by cust prod cost;
if first.prod and first.cost and b then delete;
run;

ieva · Posted 03-27-2009 07:24 AM

Changed the code a bit: you can add one extra variable that counts how many times the same situation is repeated. So even if it is repeated also in b, this should delete only those observations that match and not all of them:

proc sort data=a ;
by cust prod cost;
run;

proc sort data=b;
by cust prod cost;
run;

data a1;
set a;
by cust prod cost;
if first.cost then count=0;
count+1;
run;

data b1;
set b;
by cust prod cost;
if first.cost then count=0;
count+1;
run;

data c (drop = count);
merge a1 b1 (in=b);
by cust prod cost count;
if b then delete;
run;

deleted_user · Posted 03-27-2009 07:26 AM

The SQL set operator EXCEPT comes to mind.
However, a recommendation would be based on more information like: SAS platform, data volumes and frequency of update, the use/re-use of the information,
HTH
PeterC

deleted_user · Posted 03-29-2009 06:36 AM

Hi all, thanks for the great and fast responses. Ieva your second solution with the counts, worked an absolute treat. Thanks so much.

Regards
DW

SAS Merge Process [SOLVED]