Please see Example 1 and Result Output Page 1 in this link: https://support.sas.com/resources/papers/proceedings/proceedings/sugi29/065-29.pdf
The following SQL procedure provides Result Output on page 1:
proc sql;
select a.*,
from abuse a, abuse b
where a.o_ssn=b.v_ssn and
a.v_ssn=b.o_ssn and
a.date=b.date;
quit;
I would like to keep only cases 1 and 5 instead of cases 1,2,5,6. What code would I need to write to do that?
Thanks for the help!
Try this:
data abuse;
infile cards dlm=" ";
input case o_ssn:$11. v_ssn:$11. date:$10.;
datalines;
1 000-00-0001 000-00-0002 01/01/2000
2 000-00-0002 000-00-0001 01/01/2000
3 000-00-0003 000-00-0004 01/01/2000
4 000-00-0005 000-00-0006 01/01/2000
5 000-00-0007 000-00-0008 01/01/2000
6 000-00-0008 000-00-0007 01/01/2000
;
run;
proc sql;
create table no_reverse_match as
select
a.case, a.o_ssn, a.v_ssn
from
abuse a inner join abuse b
on a.o_ssn = b.v_ssn and a.v_ssn = b.o_ssn and a.o_ssn < a.v_ssn
where
a.date=b.date;
quit;
The order of output from SQL is not guaranteed. You should provide some actual variable values indicating which observations you want.
Better is to provide some rule(s) related to the content of a data set. I can write code that will keep just those observations but isn't extensible to any other project or data set.
I have a dataset with about 5,000,000 observations. I won't be able to individually select which observations to keep. Is there a way I can code for a rule to just keep 1 set of matches?
@r-lee-484 wrote:
I have a dataset with about 5,000,000 observations. I won't be able to individually select which observations to keep. Is there a way I can code for a rule to just keep 1 set of matches?
I repeat: what are the rules involved with which observations you want to keep.
I set of matches to what?
Perhaps it is time to provide an example of data similar to what you currently have what you want for the output given that starting data.
There are many ways to get "1 set" of somethings. How a "set" is defined is critical in any of the processes.
Try this:
data abuse;
infile cards dlm=" ";
input case o_ssn:$11. v_ssn:$11. date:$10.;
datalines;
1 000-00-0001 000-00-0002 01/01/2000
2 000-00-0002 000-00-0001 01/01/2000
3 000-00-0003 000-00-0004 01/01/2000
4 000-00-0005 000-00-0006 01/01/2000
5 000-00-0007 000-00-0008 01/01/2000
6 000-00-0008 000-00-0007 01/01/2000
;
run;
proc sql;
create table no_reverse_match as
select
a.case, a.o_ssn, a.v_ssn
from
abuse a inner join abuse b
on a.o_ssn = b.v_ssn and a.v_ssn = b.o_ssn and a.o_ssn < a.v_ssn
where
a.date=b.date;
quit;
Thank you very much, that code works to remove reverse matches and to retain 1 set of matches.
On another note, what code should I write to filter out any paired matches? From the dataset "abuse" I only want to keep cases 3 and 4, and I don't want to keep case 1,2,5,6?
Thanks for your help!
@r-lee-484 wrote:
Thank you very much, that code works to remove reverse matches and to retain 1 set of matches.
On another note, what code should I write to filter out any paired matches? From the dataset "abuse" I only want to keep cases 3 and 4, and I don't want to keep case 1,2,5,6?
In other words, you want every BUT the paired matches, correct? Then you could just use the EXCEPT set operator:
proc sql;
create table unmatched_cases as
select c.case, c.o_ssn, c.v_ssn
from abuse as c
except
select a.case, a.o_ssn, a.v_ssn
from
abuse a inner join abuse b
on a.o_ssn = b.v_ssn
and a.v_ssn = b.o_ssn
where
a.date=b.date;
quit;
There is also straightforward DATA step code for this task:
data un_matched_cases;
set abuse;
if _n_=1 then do;
declare hash h (dataset:'abuse (keep=date o_ssn v_ssn)');
h.definekey('date','o_ssn','v_ssn');
h.definedone();
end;
if h.check(key:date,key:v_ssn,key:o_ssn)^=0;
run;
Note order of keys for the hash object h is DATE,O_SSN,V_SSN. That means the check method - h.check(...) - for "reverse match" must present the arguments ordered as DATE,V_SSN,O_SSN.
This works thank you!
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.