I I have 2 datasets (few records duplicated in firstds and those have corresponding records in secondds)
->Find number of records in common between datasets?
-->Elusive to first and second datasets.?
Note: few records have many to many relationship between these datasteps.,guessing my logic is failing here.. please help me.
data both first second; merge firstds(in=firstinvar) secondds(in=secinvar); by _all_; if firstinvar and secinvar then output both; else if firstinvar then output first; else if secinvar then output second; /*put varibles includes firstinvar and secondinvar */ ; run;
Code would look like this:
proc sql;
create table count_a as
select
a,
b,
count(*) as count_a
from a
group by a, b
;
create table count_b as
select
a,
b,
count(*) as count_b
from b
group by a, b
;
quit;
data
match
only_a
only_b
more_in_a
more_in_b
;
merge
count_a (in=in_a)
count_b (in=in_b)
;
by a b;
if in_a and in_b
then do;
if count_a = count_b then output match;
else if count_a > count_b then output more_in_a;
else output more_in_b;
end;
else if in_a then output only_a;
else output only_b;
run;
You say you can have many-to-many matches. So if firstds has 3 records that match 2 records in secondds, how many records do you want in dataset BOTH? 2?, 3?, 5?
Your program as is would put 3 there - i.e. as many records as whichever source has the most for a given match.
first second /*dataset names */ a b a b /* varible names */ 1 a 1 a 2 b 2 b 2 b 2 b 3 c 3 c 3 c 3 c 3 c 4 d /*desired output should be */ both onlyA onlyB a b a b a b 1 a 3 c 4 d 2 b 2 b 3 c 3 c
please let me know,whether i have given clarity or not??
How can the observation
3 c
be in onlya, when there are two matches for it in dataset second?
3 C record
in dataset A 3 times and 2 times in dataset B
I need answer that dataset A has one extra record for 3 C.
purpose: there is db migration happened, i have pulled records from both dwh into sas and have to compare ..
now would like to know
1.only in dataset A(3 c should come here once)
2.only in dataset B(4 D should come here)
4 Both/common in A and B(3 C should come here twice)
You should first calculate counts by groups for both datasets, and then merge the result, where you filter out those where the counts do not match.
Code would look like this:
proc sql;
create table count_a as
select
a,
b,
count(*) as count_a
from a
group by a, b
;
create table count_b as
select
a,
b,
count(*) as count_b
from b
group by a, b
;
quit;
data
match
only_a
only_b
more_in_a
more_in_b
;
merge
count_a (in=in_a)
count_b (in=in_b)
;
by a b;
if in_a and in_b
then do;
if count_a = count_b then output match;
else if count_a > count_b then output more_in_a;
else output more_in_b;
end;
else if in_a then output only_a;
else output only_b;
run;
Thanks a ton .
I got the expected result after applying round function with your logic.
i confused a lot with proc compare and put logs but after applying round function,but no where found the information to apply round...
How can i achieve Datasets "more_in_a and "more_in_b" with proc sql?
Datasets "Match,only_a,only_b" can be achieved with proc sql.
@rajeshm wrote:
How can i achieve Datasets "more_in_a and "more_in_b" with proc sql?
Datasets "Match,only_a,only_b" can be achieved with proc sql.
Maxim 14: Use the Right Tool.
The right tool is the data step, so you use it.
I have to compare 2 datasets(A and B) and would like to know common records in Both,Exclusive to first and exclusive to second. Please somebody provide the logic
Note:
->1.Dataset A contains duplicate records and Dataset B also contains duplicate records.
-->2.if dataset A cotains 3 records(duplicate) and dataset B contains only 2 of duplicate records , i should get like exclusive to A has one record.
-->3.if dataset B cotains 3 records(duplicate) and dataset A contains only 2 of duplicate records , i should get like exclusive to B has one record.
I struggled a lot because(put log)
amount=944.68
indatasetA=1 indatasetB=0 /* these are in-varibles for datasets with put */
amount=944.68
indatasetA=0 indatasetB=1 /* these are in-varibles for datasets with put */
proc compare
Obs || sumamount sumamount Diff. % Diff ________ || _________ _________ _________ _________ || 12 || 200.39 200.39 2.842E-14 1.418E-14 |
Please do not repost the basically same question in different threads, it only makes it harder to follow answers and suggestions. That's why I moved this post back into this thread.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.