BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
rajeshm
Quartz | Level 8

I  I have 2 datasets (few records duplicated in firstds and those have corresponding records in secondds) 

->Find number of records in common between datasets?

-->Elusive to first and second datasets.?

Note: few records have many to many relationship between these datasteps.,guessing my logic is failing here.. please help me.

data both first second;
merge firstds(in=firstinvar) secondds(in=secinvar);
by _all_;
if firstinvar and secinvar then output both;
else if firstinvar then output first;
else if secinvar then output second;
/*put varibles includes firstinvar and secondinvar */ ;
run;
1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

Code would look like this:

proc sql;
create table count_a as
  select
    a,
    b,
    count(*) as count_a
  from a
  group by a, b
;
create table count_b as
  select
    a,
    b,
    count(*) as count_b
  from b
  group by a, b
;
quit;

data
  match
  only_a
  only_b
  more_in_a
  more_in_b
;
merge
  count_a (in=in_a)
  count_b (in=in_b)
;
by a b;
if in_a and in_b
then do;
  if count_a = count_b then output match;
  else if count_a > count_b then output more_in_a;
  else output more_in_b;
end;
else if in_a then output only_a;
else output only_b;
run;

View solution in original post

24 REPLIES 24
cminard
Obsidian | Level 7
This could really use an example of the data that you're trying to merge together and the desired output. But, a suggestion might be to NOT merge the data. Rather, just combine the two datasets together using a set statement (e.g., "set firstds secondds;"). Then, sort data and remove the observations that you do not want.
rajeshm
Quartz | Level 8
Thanks for reply, with set we can combine 2 datasets into one. But how can I know exclusive to firt dataset and second dataset and intersected records?? Please provide logic.
rajeshm
Quartz | Level 8
Thanks for reply, with set we can combine 2 datasets into one. But how can I know exclusive to firt dataset and second dataset and intersected records?? Please provide logic.
mkeintz
PROC Star

You say you can have many-to-many matches.  So if firstds has 3 records that match 2 records in secondds, how many records do you want in dataset BOTH?  2?, 3?, 5?

 

Your program as is would put 3 there - i.e. as many records as whichever source has the most for a given match.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
rajeshm
Quartz | Level 8
first  second /*dataset names */
a b    a b  /* varible names */
1 a    1 a
2 b    2 b
2 b    2 b
3 c    3 c
3 c    3 c
3 c    4 d
/*desired output should be */
both   onlyA  onlyB
a b     a b    a b
1 a     3 c    4 d
2 b
2 b
3 c
3 c   

please let me know,whether i have given clarity or not??

rajeshm
Quartz | Level 8

3 C record

in dataset A 3 times and 2 times in dataset B

I need answer that  dataset A has one extra record for 3 C.

 

purpose: there is db migration happened, i have pulled records from both dwh into sas and have to compare ..

now would like to know

1.only in dataset A(3 c should come here once)

2.only in dataset B(4 D should come here)

4 Both/common in A and B(3 C should come here twice)

 

 

Kurt_Bremser
Super User

You should first calculate counts by groups for both datasets, and then merge the result, where you filter out those where the counts do not match.

Kurt_Bremser
Super User

Code would look like this:

proc sql;
create table count_a as
  select
    a,
    b,
    count(*) as count_a
  from a
  group by a, b
;
create table count_b as
  select
    a,
    b,
    count(*) as count_b
  from b
  group by a, b
;
quit;

data
  match
  only_a
  only_b
  more_in_a
  more_in_b
;
merge
  count_a (in=in_a)
  count_b (in=in_b)
;
by a b;
if in_a and in_b
then do;
  if count_a = count_b then output match;
  else if count_a > count_b then output more_in_a;
  else output more_in_b;
end;
else if in_a then output only_a;
else output only_b;
run;
rajeshm
Quartz | Level 8

Thanks a ton .

I got the expected result after applying round function with your logic.

i confused a lot with proc compare and put logs but after applying round function,but no where found the information to apply round...

rajeshm
Quartz | Level 8

How can i achieve Datasets "more_in_a and "more_in_b" with proc sql?

Datasets "Match,only_a,only_b"  can be achieved with proc sql.

 

 

 

Kurt_Bremser
Super User

@rajeshm wrote:

How can i achieve Datasets "more_in_a and "more_in_b" with proc sql?

Datasets "Match,only_a,only_b"  can be achieved with proc sql.

 


Maxim 14: Use the Right Tool.

The right tool is the data step, so you use it.

rajeshm
Quartz | Level 8

I have to compare 2 datasets(A and B) and would like to know common records in Both,Exclusive to first and exclusive to second.  Please somebody provide the logic

Note:

->1.Dataset A contains duplicate records and Dataset B also contains duplicate records.

-->2.if dataset A cotains 3 records(duplicate) and dataset B contains only 2 of duplicate records , i should get like exclusive to A has one record.

-->3.if dataset B cotains 3 records(duplicate) and dataset A contains only 2 of duplicate records , i should get like exclusive to B has one record.

 

I struggled a lot because(put log)

amount=944.68
indatasetA=1 indatasetB=0  /* these are in-varibles for datasets with put */
amount=944.68
indatasetA=0 indatasetB=1  /* these are in-varibles for datasets with put */

 

proc compare

                             Obs ||  sumamount  sumamount      Diff.     % Diff                       
                       ________  ||  _________  _________  _________  _________                       
                                 ||                                                                   
                             12  ||     200.39     200.39  2.842E-14  1.418E-14    

 

Kurt_Bremser
Super User

Please do not repost the basically same question in different threads, it only makes it harder to follow answers and suggestions. That's why I moved this post back into this thread.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 24 replies
  • 1991 views
  • 6 likes
  • 5 in conversation