BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
r-lee-484
Fluorite | Level 6

Please see Example 1 and Result Output Page 1 in this link: https://support.sas.com/resources/papers/proceedings/proceedings/sugi29/065-29.pdf

 

The following SQL procedure provides Result Output on page 1:

proc sql;
select a.*,
from abuse a, abuse b
where a.o_ssn=b.v_ssn and
a.v_ssn=b.o_ssn and
a.date=b.date;
quit;

 

 

I would like to keep only cases 1 and 5 instead of cases 1,2,5,6. What code would I need to write to do that?

 

Thanks for the help!

1 ACCEPTED SOLUTION

Accepted Solutions
HB
Barite | Level 11 HB
Barite | Level 11

Try this:

 


data abuse;
	infile cards dlm=" 	";
	input case o_ssn:$11. v_ssn:$11. date:$10.;
datalines;
1 000-00-0001 000-00-0002 01/01/2000
2 000-00-0002 000-00-0001 01/01/2000
3 000-00-0003 000-00-0004 01/01/2000
4 000-00-0005 000-00-0006 01/01/2000
5 000-00-0007 000-00-0008 01/01/2000
6 000-00-0008 000-00-0007 01/01/2000
;
run;


proc sql;
create table no_reverse_match as 
select 
a.case, a.o_ssn, a.v_ssn

from 
abuse a inner join abuse b
on a.o_ssn = b.v_ssn and a.v_ssn = b.o_ssn and a.o_ssn < a.v_ssn 

where  
a.date=b.date;
quit;

 

 

View solution in original post

7 REPLIES 7
ballardw
Super User

The order of output from SQL is not guaranteed. You should provide some actual variable  values indicating which observations you want.

 

Better is to provide some rule(s) related to the content of a data set. I can write code that will keep just those observations but isn't extensible to any other project or data set.

r-lee-484
Fluorite | Level 6

I have a dataset with about 5,000,000 observations. I won't be able to individually select which observations to keep. Is there a way I can code for a rule to just keep 1 set of matches?

ballardw
Super User

@r-lee-484 wrote:

I have a dataset with about 5,000,000 observations. I won't be able to individually select which observations to keep. Is there a way I can code for a rule to just keep 1 set of matches?


I repeat: what are the rules involved with which observations you want to keep.

I set of matches to what?

 

Perhaps it is time to provide an example of data similar to what you currently have what you want for the output given that starting data.

 

There are many ways to get "1 set" of somethings. How a "set" is defined is critical in any of the processes.

 

HB
Barite | Level 11 HB
Barite | Level 11

Try this:

 


data abuse;
	infile cards dlm=" 	";
	input case o_ssn:$11. v_ssn:$11. date:$10.;
datalines;
1 000-00-0001 000-00-0002 01/01/2000
2 000-00-0002 000-00-0001 01/01/2000
3 000-00-0003 000-00-0004 01/01/2000
4 000-00-0005 000-00-0006 01/01/2000
5 000-00-0007 000-00-0008 01/01/2000
6 000-00-0008 000-00-0007 01/01/2000
;
run;


proc sql;
create table no_reverse_match as 
select 
a.case, a.o_ssn, a.v_ssn

from 
abuse a inner join abuse b
on a.o_ssn = b.v_ssn and a.v_ssn = b.o_ssn and a.o_ssn < a.v_ssn 

where  
a.date=b.date;
quit;

 

 

r-lee-484
Fluorite | Level 6

Thank you very much, that code works to remove reverse matches and to retain 1 set of matches.

 

On another note, what code should I write to filter out any paired matches? From the dataset "abuse" I only want to keep cases 3 and 4, and I don't want to keep case 1,2,5,6?

 

Thanks for your help!

mkeintz
PROC Star

@r-lee-484 wrote:

Thank you very much, that code works to remove reverse matches and to retain 1 set of matches.

 

On another note, what code should I write to filter out any paired matches? From the dataset "abuse" I only want to keep cases 3 and 4, and I don't want to keep case 1,2,5,6?

In other words, you want every BUT the paired matches, correct?  Then you could just use the EXCEPT set operator:


proc sql;
create table unmatched_cases  as 
  select c.case, c.o_ssn, c.v_ssn  
  from abuse as c
  except

  select  a.case, a.o_ssn, a.v_ssn
  from 
    abuse a inner join abuse b
    on a.o_ssn = b.v_ssn 
       and a.v_ssn = b.o_ssn 
  where  
    a.date=b.date;
quit;

There is also straightforward DATA step code for this task:

 

data un_matched_cases;
  set abuse;
  if _n_=1 then do;
    declare hash h (dataset:'abuse (keep=date o_ssn v_ssn)');
      h.definekey('date','o_ssn','v_ssn');
      h.definedone();
  end;
  if h.check(key:date,key:v_ssn,key:o_ssn)^=0;
run;

Note order of keys for the hash object h is DATE,O_SSN,V_SSN.   That means the check method - h.check(...)  - for "reverse match" must present the arguments ordered as DATE,V_SSN,O_SSN.

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
r-lee-484
Fluorite | Level 6

This works thank you!

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 7 replies
  • 2084 views
  • 3 likes
  • 4 in conversation