Desktop productivity for business analysts and programmers

Deleted and take the difference

Reply
Contributor
Posts: 42

Deleted and take the difference

[ Edited ]

I have two data sets with IDs and Cust_id data_1 is my main data set its a snapshot of our customer list at the start of the year and i want to compare to data_2 which is updated monthly. 

 

Data_2 might have new cust_id added over the time but I dont want to look at those. 

 

Data_1 : data at start of year 

 

ID      Cust_id

1          111

1          222

1          333

1         444

2          555

2          666

2          777

2          888

3         999

3        1000

3        1001

3        1002

 

 

Data_2 

ID      Cust_id

1          111

1          222

1          1110

1         5786

2          555

2          8989

2          0986

3         999

 

Want: 

Compare Data_1 to Data_2 and only keep cust_ids in data_2 that are also in Data_1 and than get the diff

 

New_Data_2

ID      Cust_id  difference

1          111          2

1          222        2

 

2          555          3

 

3         999           3

 

difference = count(data_1.Cust_id) -count(data_2.Cust_id)

 

Super User
Posts: 10,568

Re: Deleted and take the difference

Are you sure you want sum() and not count()? Because I can't see where you'd get just 2 from those numbers.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
How to post code
Contributor
Posts: 42

Re: Deleted and take the difference

Posted in reply to KurtBremser
sorry yeah i meant count of the cust_id
Super User
Posts: 2,050

Re: Deleted and take the difference

[ Edited ]
data one;
input ID      Cust_id;
cards;
1          111
1          222
1          333
1         444
2          555
2          666
2          777
2          888
3         999
3        1000
3        1001
3        1002
;
data two;
input ID      Cust_id;
cards;
1          111
1          222
1          1110
1         5786
2          555
2          8989
2          0986
3         999
;
proc sql;
create table want as
select a.*,b.count-count(a.cust_id) as difference
from two a, (select *,count(cust_id) as count from one group by id) b
where a.cust_id=b.cust_id and a.id=b.id
group by a.id;
quit;
Super User
Posts: 2,050

Re: Deleted and take the difference

Posted in reply to novinosrin
data one;
input ID      Cust_id;
cards;
1          111
1          222
1          333
1         444
2          555
2          666
2          777
2          888
3         999
3        1000
3        1001
3        1002
;
data two;
input ID      Cust_id;
cards;
1          111
1          222
1          1110
1         5786
2          555
2          8989
2          0986
3         999
;

data want;
if _n_=1 then do;
   dcl hash h(dataset:'two', multidata: 'y');
   h.definekey('id','cust_id');
    h.definedone();
end;
_c=0;
do _n=1 by 1 until(last.id);
set one;
by ID    Cust_id;
if h.check()=0 then  _c+1;
end;
do until(last.id);
set one;
by ID    Cust_id;
if h.check()=0 then do;count=_n-_c;output;end;
end;
drop _:;
run;
Esteemed Advisor
Posts: 5,624

Re: Deleted and take the difference

Can also be done this way with a left join:

 

proc sql;
select 
    one.*,
    sum(missing(two.cust_id)) as difference
from 
    one left join two on one.id=two.id and one.cust_id=two.cust_id
group by one.id
having not missing(two.cust_id);
quit;
PG
Ask a Question
Discussion stats
  • 5 replies
  • 122 views
  • 0 likes
  • 4 in conversation