BookmarkSubscribeRSS Feed
hk2013
Fluorite | Level 6

I have two data sets with IDs and Cust_id data_1 is my main data set its a snapshot of our customer list at the start of the year and i want to compare to data_2 which is updated monthly. 

 

Data_2 might have new cust_id added over the time but I dont want to look at those. 

 

Data_1 : data at start of year 

 

ID      Cust_id

1          111

1          222

1          333

1         444

2          555

2          666

2          777

2          888

3         999

3        1000

3        1001

3        1002

 

 

Data_2 

ID      Cust_id

1          111

1          222

1          1110

1         5786

2          555

2          8989

2          0986

3         999

 

Want: 

Compare Data_1 to Data_2 and only keep cust_ids in data_2 that are also in Data_1 and than get the diff

 

New_Data_2

ID      Cust_id  difference

1          111          2

1          222        2

 

2          555          3

 

3         999           3

 

difference = count(data_1.Cust_id) -count(data_2.Cust_id)

 

5 REPLIES 5
hk2013
Fluorite | Level 6
sorry yeah i meant count of the cust_id
novinosrin
Tourmaline | Level 20
data one;
input ID      Cust_id;
cards;
1          111
1          222
1          333
1         444
2          555
2          666
2          777
2          888
3         999
3        1000
3        1001
3        1002
;
data two;
input ID      Cust_id;
cards;
1          111
1          222
1          1110
1         5786
2          555
2          8989
2          0986
3         999
;
proc sql;
create table want as
select a.*,b.count-count(a.cust_id) as difference
from two a, (select *,count(cust_id) as count from one group by id) b
where a.cust_id=b.cust_id and a.id=b.id
group by a.id;
quit;
novinosrin
Tourmaline | Level 20
data one;
input ID      Cust_id;
cards;
1          111
1          222
1          333
1         444
2          555
2          666
2          777
2          888
3         999
3        1000
3        1001
3        1002
;
data two;
input ID      Cust_id;
cards;
1          111
1          222
1          1110
1         5786
2          555
2          8989
2          0986
3         999
;

data want;
if _n_=1 then do;
   dcl hash h(dataset:'two', multidata: 'y');
   h.definekey('id','cust_id');
    h.definedone();
end;
_c=0;
do _n=1 by 1 until(last.id);
set one;
by ID    Cust_id;
if h.check()=0 then  _c+1;
end;
do until(last.id);
set one;
by ID    Cust_id;
if h.check()=0 then do;count=_n-_c;output;end;
end;
drop _:;
run;
PGStats
Opal | Level 21

Can also be done this way with a left join:

 

proc sql;
select 
    one.*,
    sum(missing(two.cust_id)) as difference
from 
    one left join two on one.id=two.id and one.cust_id=two.cust_id
group by one.id
having not missing(two.cust_id);
quit;
PG

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

SAS Enterprise Guide vs. SAS Studio

What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 819 views
  • 0 likes
  • 4 in conversation