Hi,
I have a dataset with 250 records and a dataset of 1.4 million. I created a key which is a concatenation of lastname, last4digitstaxid and zipcode. How can i find how many of the 250 records are there in 1.4 million and what they are. I have SAS EG.
Thanks,
Tom
Query Builder is the easiest way to match/join data.
Be sure that you have the corresponding variables/columns in both your tables.
I am looking to see if if the dataset of 1.4 million contains any of 250. So the out put range should be within 1-250. Now when i did an inner join, i get 498.
Not necessarily. If you have duplicate keys in your 1.4 million records, ALL of them will end up in your output table.
Getting a count > 250 means there are some duplicates (as far as the key fields are concerned) in the larger dataset.
Suggested query:
proc sql;
select count (distinct a.last_name||a.ssn4||a.zipcode) as distinct_cnts
from table1 a
inner join table2 b
on (a.last_name=b.last_name)
and (a.ssn4=b.ssn4)
and (a.zipcode=b.zipcode);
quit;
Exploring using the MERGE statement.
Some sample datasets:
/*******************/
/*** dataset t_a ***/
/*******************/
data t_a(keep=j k);
do i = 1 to 40;
j = ceil(5*ranuni(3));
k = ceil(5*ranuni(3));
output;
end;
run;
proc sort data=t_a nodupkey; by j k; run; /*** get unique sets of (j,k) ***/
/*******************/
/*** dataset t_a ***/
/*******************/
data t_b;
do i = 1 to 4000;
j = ceil(50*ranuni(3));
k = ceil(50*ranuni(3));
output;
end;
run;
proc sort data=t_b; by j k; run;
/******************************************/
/*** calculate the number of (j,k) sets ***/
/*** (cnts) present in dataset t_b. ***/
/******************************************/
data t_e(keep=cnts);
merge t_a(in=a) t_b(in=b) end=n_last;
by j k;
if (first.j or first.k) and (a and b) then cnts+1;
if n_last then output;
run;
In the current case, dataset t_e has 1 variable (cnts), 1 row, with cnts = 16.
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.