Thank you very much @Kurt_Bremser for helping me! See what kind of relationship with regard to ID you have (one-to-many, many-to-one, many-to-many). It is many-to-one: ID is a foreign key in my_huge_in_table referencing the primary key of my_small_in_table. Count the results in both datasets per ID, then compare these two to see if IDs are missing in the smaller one, or where certain IDs have less results in it. Then look at the source observations for non-matching IDs to see a pattern. There are 21,789 distinct ID in my_small_in_table, with 1 row each (21,789 rows). No missing ID. There are 21,564 distinct ID in my_huge_in_table for 22,430,607 rows. No missing ID. All the 21,564 ID of my_huge_in_table are also in my_small_in_table. 16,641 of those 21,564 ID represented in my_huge_in_table have (my_filter_variable = 1) in my_small_in_table. This represents 17,772,627 rows of my_huge_in_table. The other 4,923 ID have (my_filter_variable = 0) in my_small_in_table and represents 4,657,980 rows of my_huge_in_table. In my table BOTH_STEPS_1_AND_2 there are 5,720,957 of the 17,772,627 rows supposed to be there, belonging to 13,575 of the 16,641 ID supposed to be there. The missing 12,051,670 rows belong to 15,131 of the 16,641 ID supposed to be there. I haven't found yet any pattern. What do you get when you run this: data check;
set my_huge_in_table;
if _n_ = 1
then do;
declare hash f (dataset:"my_small_in_table (where=(my_filter_variable = 1))");
f.definekey("id");
f.definedone();
end;
if f.check() = 0;
run; ? NOTE: There were 16641 observations read from the data set WORK.MY_SMALL_IN_TABLE.
WHERE my_filter_variable =1; NOTE: There were 22430607 observations read from the data set WORK.MY_HUGE_IN_TABLE. NOTE: The data set WORK.CHECK has 17772627 observations and 5 variables. NOTE: DATA statement used (Total process time): real time 9.25 seconds cpu time 5.78 seconds I shall confess I have not yet followed your advice to read SAS® Hash Object Programming Made Easy by Michele M. Burlew, hence I can't do hash programming on my own.
... View more