BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
rajeshm
Quartz | Level 8

I  I have 2 datasets (few records duplicated in firstds and those have corresponding records in secondds) 

->Find number of records in common between datasets?

-->Elusive to first and second datasets.?

Note: few records have many to many relationship between these datasteps.,guessing my logic is failing here.. please help me.

data both first second;
merge firstds(in=firstinvar) secondds(in=secinvar);
by _all_;
if firstinvar and secinvar then output both;
else if firstinvar then output first;
else if secinvar then output second;
/*put varibles includes firstinvar and secondinvar */ ;
run;
1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

Code would look like this:

proc sql;
create table count_a as
  select
    a,
    b,
    count(*) as count_a
  from a
  group by a, b
;
create table count_b as
  select
    a,
    b,
    count(*) as count_b
  from b
  group by a, b
;
quit;

data
  match
  only_a
  only_b
  more_in_a
  more_in_b
;
merge
  count_a (in=in_a)
  count_b (in=in_b)
;
by a b;
if in_a and in_b
then do;
  if count_a = count_b then output match;
  else if count_a > count_b then output more_in_a;
  else output more_in_b;
end;
else if in_a then output only_a;
else output only_b;
run;

View solution in original post

24 REPLIES 24
cminard
Quartz | Level 8
This could really use an example of the data that you're trying to merge together and the desired output. But, a suggestion might be to NOT merge the data. Rather, just combine the two datasets together using a set statement (e.g., "set firstds secondds;"). Then, sort data and remove the observations that you do not want.
rajeshm
Quartz | Level 8
Thanks for reply, with set we can combine 2 datasets into one. But how can I know exclusive to firt dataset and second dataset and intersected records?? Please provide logic.
rajeshm
Quartz | Level 8
Thanks for reply, with set we can combine 2 datasets into one. But how can I know exclusive to firt dataset and second dataset and intersected records?? Please provide logic.
mkeintz
PROC Star

You say you can have many-to-many matches.  So if firstds has 3 records that match 2 records in secondds, how many records do you want in dataset BOTH?  2?, 3?, 5?

 

Your program as is would put 3 there - i.e. as many records as whichever source has the most for a given match.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
rajeshm
Quartz | Level 8
first  second /*dataset names */
a b    a b  /* varible names */
1 a    1 a
2 b    2 b
2 b    2 b
3 c    3 c
3 c    3 c
3 c    4 d
/*desired output should be */
both   onlyA  onlyB
a b     a b    a b
1 a     3 c    4 d
2 b
2 b
3 c
3 c   

please let me know,whether i have given clarity or not??

rajeshm
Quartz | Level 8

3 C record

in dataset A 3 times and 2 times in dataset B

I need answer that  dataset A has one extra record for 3 C.

 

purpose: there is db migration happened, i have pulled records from both dwh into sas and have to compare ..

now would like to know

1.only in dataset A(3 c should come here once)

2.only in dataset B(4 D should come here)

4 Both/common in A and B(3 C should come here twice)

 

 

Kurt_Bremser
Super User

You should first calculate counts by groups for both datasets, and then merge the result, where you filter out those where the counts do not match.

Kurt_Bremser
Super User

Code would look like this:

proc sql;
create table count_a as
  select
    a,
    b,
    count(*) as count_a
  from a
  group by a, b
;
create table count_b as
  select
    a,
    b,
    count(*) as count_b
  from b
  group by a, b
;
quit;

data
  match
  only_a
  only_b
  more_in_a
  more_in_b
;
merge
  count_a (in=in_a)
  count_b (in=in_b)
;
by a b;
if in_a and in_b
then do;
  if count_a = count_b then output match;
  else if count_a > count_b then output more_in_a;
  else output more_in_b;
end;
else if in_a then output only_a;
else output only_b;
run;
rajeshm
Quartz | Level 8

Thanks a ton .

I got the expected result after applying round function with your logic.

i confused a lot with proc compare and put logs but after applying round function,but no where found the information to apply round...

rajeshm
Quartz | Level 8

How can i achieve Datasets "more_in_a and "more_in_b" with proc sql?

Datasets "Match,only_a,only_b"  can be achieved with proc sql.

 

 

 

Kurt_Bremser
Super User

@rajeshm wrote:

How can i achieve Datasets "more_in_a and "more_in_b" with proc sql?

Datasets "Match,only_a,only_b"  can be achieved with proc sql.

 


Maxim 14: Use the Right Tool.

The right tool is the data step, so you use it.

rajeshm
Quartz | Level 8

I have to compare 2 datasets(A and B) and would like to know common records in Both,Exclusive to first and exclusive to second.  Please somebody provide the logic

Note:

->1.Dataset A contains duplicate records and Dataset B also contains duplicate records.

-->2.if dataset A cotains 3 records(duplicate) and dataset B contains only 2 of duplicate records , i should get like exclusive to A has one record.

-->3.if dataset B cotains 3 records(duplicate) and dataset A contains only 2 of duplicate records , i should get like exclusive to B has one record.

 

I struggled a lot because(put log)

amount=944.68
indatasetA=1 indatasetB=0  /* these are in-varibles for datasets with put */
amount=944.68
indatasetA=0 indatasetB=1  /* these are in-varibles for datasets with put */

 

proc compare

                             Obs ||  sumamount  sumamount      Diff.     % Diff                       
                       ________  ||  _________  _________  _________  _________                       
                                 ||                                                                   
                             12  ||     200.39     200.39  2.842E-14  1.418E-14    

 

Kurt_Bremser
Super User

Please do not repost the basically same question in different threads, it only makes it harder to follow answers and suggestions. That's why I moved this post back into this thread.