BookmarkSubscribeRSS Feed
hk2013
Fluorite | Level 6

I have two data sets with IDs and Cust_id data_1 is my main data set its a snapshot of our customer list at the start of the year and i want to compare to data_2 which is updated monthly. 

 

Data_2 might have new cust_id added over the time but I dont want to look at those. 

 

Data_1 : data at start of year 

 

ID      Cust_id

1          111

1          222

1          333

1         444

2          555

2          666

2          777

2          888

3         999

3        1000

3        1001

3        1002

 

 

Data_2 

ID      Cust_id

1          111

1          222

1          1110

1         5786

2          555

2          8989

2          0986

3         999

 

Want: 

Compare Data_1 to Data_2 and only keep cust_ids in data_2 that are also in Data_1 and than get the diff

 

New_Data_2

ID      Cust_id  difference

1          111          2

1          222        2

 

2          555          3

 

3         999           3

 

difference = count(data_1.Cust_id) -count(data_2.Cust_id)

 

5 REPLIES 5
hk2013
Fluorite | Level 6
sorry yeah i meant count of the cust_id
novinosrin
Tourmaline | Level 20
data one;
input ID      Cust_id;
cards;
1          111
1          222
1          333
1         444
2          555
2          666
2          777
2          888
3         999
3        1000
3        1001
3        1002
;
data two;
input ID      Cust_id;
cards;
1          111
1          222
1          1110
1         5786
2          555
2          8989
2          0986
3         999
;
proc sql;
create table want as
select a.*,b.count-count(a.cust_id) as difference
from two a, (select *,count(cust_id) as count from one group by id) b
where a.cust_id=b.cust_id and a.id=b.id
group by a.id;
quit;
novinosrin
Tourmaline | Level 20
data one;
input ID      Cust_id;
cards;
1          111
1          222
1          333
1         444
2          555
2          666
2          777
2          888
3         999
3        1000
3        1001
3        1002
;
data two;
input ID      Cust_id;
cards;
1          111
1          222
1          1110
1         5786
2          555
2          8989
2          0986
3         999
;

data want;
if _n_=1 then do;
   dcl hash h(dataset:'two', multidata: 'y');
   h.definekey('id','cust_id');
    h.definedone();
end;
_c=0;
do _n=1 by 1 until(last.id);
set one;
by ID    Cust_id;
if h.check()=0 then  _c+1;
end;
do until(last.id);
set one;
by ID    Cust_id;
if h.check()=0 then do;count=_n-_c;output;end;
end;
drop _:;
run;
PGStats
Opal | Level 21

Can also be done this way with a left join:

 

proc sql;
select 
    one.*,
    sum(missing(two.cust_id)) as difference
from 
    one left join two on one.id=two.id and one.cust_id=two.cust_id
group by one.id
having not missing(two.cust_id);
quit;
PG

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

SAS Enterprise Guide vs. SAS Studio

What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 1187 views
  • 0 likes
  • 4 in conversation