BookmarkSubscribeRSS Feed
r-lee-484
Fluorite | Level 6

I have one large dataset where I need to filter out observations that do not abide by the 4 rules below:

 

1- Keep observations where event1 is linked to event2 and have the same ID_1 and same ID_2.

2- Keep observations where different event1 is linked to the same event2 and has the same ID_1.

3-Keep observations where event1 is linked to event2 one time and both ID_1 and ID_2 is missing.

4-Keep observations where same event1 matches two different event2 and has same ID_2.

 

 

Below is a small example of dataset with 1 example for each rule:

 

event1event2ID_1ID_2 
RSS1OWT14235 
RSS2OWT14235 
RSS3OWT14235 
RSS1OWT24235 
RSS2OWT24235 
RSS3OWT24235 
RSS10OWT2.5do not keep/delete
RSS4OWT312. 
RSS7OWT312. 
RSS11OWT3..do not keep/delete
RSS13OWT12.. 
RSS9OWT5.14 
RSS9OWT8.14 
RSS17OWT8.14do not keep/delete

 

Below is the final desired dataset. I would like to create a New_ID showing distinct links based off the rules.

event1event2ID_1ID_2New_ID
RSS1OWT142351
RSS2OWT142351
RSS3OWT142351
RSS1OWT242351
RSS2OWT242351
RSS3OWT242351
RSS4OWT312.2
RSS7OWT312.2
RSS13OWT12..3
RSS9OWT5.144
RSS9OWT8.144

 

 

How can I go about programming this with loops and sql while maintaining all the rules in one dataset?

 

 

 

8 REPLIES 8
JosvanderVelden
SAS Super FREQ
Can you please specify for each observation the rule or rules that is/are involved in keeping the observation?
r-lee-484
Fluorite | Level 6

 

Rule 1 -  Keep observations where event1 is linked to event2 and have the same ID_1 and same ID_2.

event1event2ID_1ID_2 
RSS1OWT14235 
RSS2OWT14235 
RSS3OWT14235 
RSS1OWT24235 
RSS2OWT24235 
RSS3OWT24235 
RSS10OWT2.5do not keep/delete

 

Rule 2 -  Keep observations where different event1 is linked to the same event2 and has the same ID_1.

event1event2ID_1ID_2 
RSS4OWT312. 
RSS7OWT312. 
RSS11OWT3..do not keep/delete

 

Rule 3 -Keep observations where event1 is linked to event2 one time and both ID_1 and ID_2 is missing.

event1event2ID_1ID_2
RSS13OWT12..

 

Rule 4 -  Keep observations where same event1 matches two different event2 and has same ID_2.

event1event2ID_1ID_2 
RSS9OWT5.14 
RSS9OWT8.14 
RSS17OWT8.14do not keep/delete

 

 

LinusH
Tourmaline | Level 20

Sorry, but I have hard time to understand the rules given under each bullet, what does "linked" and "same" means in this data context?

Best would be to make examples under each bullet (1-4).

Data never sleeps
r-lee-484
Fluorite | Level 6

Using these two examples to explain "same" and "linked".

 

Each row is a "link". event1 is linked to event2. 

Same means the value in event1/event2/ID_1/ID_2 occurs more than once.

 

Rule 3 -Keep observations where event1 is linked to event2 one time and both ID_1 and ID_2 is missing.

 

RSS13 is linked to OWT12. RSS13 only occurs one time in the combined dataset above. 

event1event2ID_1ID_2
RSS13OWT12..

 

 

Rule 4 -  Keep observations where same event1 matches two different event2 and has same ID_2.

 

RSS9 is linked to both OWT5 and OWT8 and has ID_2 = 14 for both links.

RSS17 is also linked to OWT8 and has ID_2 = 14 for this one link.

 

 

"Same event1" means RSS9 appears twice in event1 and is linked to two different event2 (OWT5 and OWT8) whereas RSS17 only appears one time.

 

These three links all have the same ID_2 (14) but do not keep RSS17 - OWT8 link because it doesn't the follow the rule where there has to be the same event matched to 2 different event2. RSS17 is only matched to a single event2. 

event1event2ID_1ID_2 
RSS9OWT5.14 
RSS9OWT8.14 
RSS17OWT8.14do not keep/delete

 

 

Hope this is more clear.

mkeintz
PROC Star

How about providing your sample data in the form of a working data step?  You will be more likely to get actual tested suggestions.

 

Help us help you.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
r-lee-484
Fluorite | Level 6

data have;
infile cards expandtabs;
input (event1 event2 ID_1 ID_2) ($);
cards;
RSS1 OWT1 423 5
RSS2 OWT1 423 5
RSS3 OWT1 423 5
RSS1 OWT2 423 5
RSS2 OWT2 423 5
RSS3 OWT2 423 5
RSS10 OWT2 . 5
RSS4 OWT3 12 .
RSS7 OWT3 12 .
RSS11 OWT3 . .
RSS13 OWT12 . .
RSS9 OWT5 . 14
RSS9 OWT8 . 14
RSS17 OWT8 . 14
;

 

Inner join doesn't work to retain the first 6 observations. How do I get rid of RSS10 linked with OWT2 ?


proc sql;
create table rule1 as
select a.*
from have a inner join have b
on a.ID_1=b.ID_1 and a.ID_2=b.ID_2
where a.event1 NE b.event2;
run;

 

Desired output:

Rule 1 -  Keep observations where event1 is linked to event2 and have the same ID_1 and same ID_2.

event1event2ID_1ID_2 
RSS1OWT14235 
RSS2OWT14235 
RSS3OWT14235 
RSS1OWT24235 
RSS2OWT24235 
RSS3OWT24235 
RSS10OWT2.5do not keep/delete
JosvanderVelden
SAS Super FREQ

Analyze the code and results from execution with the code below:

data have;
   infile cards expandtabs;
   input (event1 event2 ID_1 ID_2) ($);
cards;
RSS1 OWT1 423 5
RSS2 OWT1 423 5
RSS3 OWT1 423 5
RSS1 OWT2 423 5
RSS2 OWT2 423 5
RSS3 OWT2 423 5
RSS10 OWT2 . 5
RSS4 OWT3 12 .
RSS7 OWT3 12 .
RSS11 OWT3 . .
RSS13 OWT12 . .
RSS9 OWT5 . 14
RSS9 OWT8 . 14
RSS17 OWT8 . 14
;

data rule1assist;
   set have;
   rule1assist = catx('-', trim(ID_1), trim(ID_2));
run;

proc sort data=rule1assist;
   by rule1assist;
run;

proc sort data=rule1assist out=rule1tmp(keep=rule1assist);
   by rule1assist;
run;

data rule1tmp;
   set rule1tmp;
   if first.rule1assist eq 1 AND last.rule1assist eq 1 then delete;
   by rule1assist;
run;

proc sort data=rule1tmp nodup;
   by rule1assist;
run;

proc sql;
   create table rule1 as
   select event1, event2, ID_1, ID_2, a.rule1assist
   from rule1assist a, rule1tmp b
   where a.rule1assist eq b.rule1assist
   ;
quit;

data rule2assist;
   set rule1;
   rule2assist = catx('-', trim(event2), trim(ID_1));
run;

proc sort data=rule2assist;
   by rule2assist;
run;

proc sort data=rule2assist out=rule2tmp(keep=rule2assist);
   by rule2assist;
run;

data rule2tmp;
   set rule2tmp;
   if first.rule2assist eq 1 AND last.rule2assist eq 1 then delete;
   by rule2assist;
run;

proc sort data=rule2tmp nodup;
   by rule2assist;
run;

proc sql;
   create table rule2 as
   select event1, event2, ID_1, ID_2, 
          a.rule1assist, a.rule2assist
   from rule2assist a, rule2tmp b
   where a.rule2assist eq b.rule2assist
   ;
quit;

This seems to implement the rule 1 correctly but not rule 2. Because after applying the rule 2 deletions your rule 3 cannot be applied anymore. So I suspect I am not applying the correct programming logic for rule 2. The code for rule 1 could work for you though.

r-lee-484
Fluorite | Level 6

Thank you for your help. I've adapted your code to generate 4 different sets of code (1 for each rule) and then used a final data step to combine the data.

 

I am struggling with Rule 3.  I would like to keep any observations that have distinct event1 and event2 and has missing ID_1 and ID_2. My code keeps 2 rows of data and I only want the row with RSS13 OWT12. How can I modify the code for rule 3 for desired output? See code below:

 

 

data have;
infile cards expandtabs;
input (event1 event2 ID_1 ID_2) ($);
cards;
RSS1 OWT1 423 5
RSS2 OWT1 423 5
RSS3 OWT1 423 5
RSS1 OWT2 423 5
RSS2 OWT2 423 5
RSS3 OWT2 423 5
RSS10 OWT2 . 5
RSS4 OWT3 12 .
RSS7 OWT3 12 .
RSS11 OWT3 . .
RSS13 OWT12 . .
RSS9 OWT5 . 14
RSS9 OWT8 . 14
RSS17 OWT8 . 14
;

/* Code for Keeping observations that meet Rule 1*/
data rule1assist;
set have;
rule1assist = catx('-', trim(ID_1), trim(ID_2));
run;

proc sort data=rule1assist;
by rule1assist;
run;

proc sort data=rule1assist out=rule1tmp(keep=rule1assist);
by rule1assist;
run;


data rule1tmp;
set rule1tmp;
if first.rule1assist eq 1 AND last.rule1assist eq 1 then delete;
by rule1assist;
run;

proc sort data=rule1tmp nodup;
by rule1assist;
run;

proc sql;
create table rule1 as
select event1, event2, ID_1, ID_2, a.rule1assist
from rule1assist a, rule1tmp b
where a.rule1assist eq b.rule1assist and ID_1 ne " " and ID_2 ne " "
;
quit;

 

 

/*Code for Keeping observations that meet Rule 2*/
data rule2assist;
set have;
rule2assist = catx('-', trim(event2), trim(ID_1));
run;

proc sort data=rule2assist;
by rule2assist;
run;

proc sort data=rule2assist out=rule2tmp(keep=rule2assist);
by rule2assist;
run;

data rule2tmp;
set rule2tmp;
if first.rule2assist eq 1 AND last.rule2assist eq 1 then delete;
by rule2assist;
run;

proc sort data=rule2tmp nodup;
by rule2assist;
run;

proc sql;
create table rule2 as
select event1, event2, ID_1, ID_2, a.rule2assist
from rule2assist a, rule2tmp b
where a.rule2assist eq b.rule2assist and ID_2 eq " "
;
quit;

 

/*Code for Keeping observations that meet Rule 3...this doesn't work*/

data rule3assist;
set have;
rule3assist = catx('-', trim(event1), trim(event2),trim(ID_1), trim(ID_2));
run;

proc sort data=rule3assist;
by rule3assist;
run;

proc sort data=rule3assist out=rule3tmp(keep=rule3assist);
by rule3assist;
run;

data rule3tmp;
set rule3tmp;
if first.rule3assist eq 1 AND last.rule3assist eq 1 ;
by rule3assist;
run;

proc sort data=rule3tmp nodup;
by rule3assist;
run;

proc sql;
create table rule3 as
select distinct *
from rule3assist a, rule3tmp b
where a.rule3assist eq b.rule3assist and ID_2 eq " " and ID_1 eq " "
;
quit;

 


/*Code for Keeping observations that meet Rule 4*/

data rule4assist;
set have;
rule4assist = catx('-', trim(event1), trim(ID_2));
run;

proc sort data=rule4assist;
by rule4assist;
run;

proc sort data=rule4assist out=rule4tmp(keep=rule4assist);
by rule4assist;
run;

data rule4tmp;
set rule4tmp;
if first.rule4assist eq 1 AND last.rule4assist eq 1 then delete;
by rule4assist;
run;

proc sort data=rule4tmp nodup;
by rule4assist;
run;

proc sql;
create table rule4 as
select event1, event2, ID_1, ID_2, a.rule4assist
from rule4assist a, rule4tmp b
where a.rule4assist eq b.rule4assist and ID_1 eq " "
;
quit;

 


data want;
set rule1 rule2 rule3 rule4;
run;

 

 

/*Code for Keeping observations that meet Rule 2*/
data rule2assist;
set have;
rule2assist = catx('-', trim(event2), trim(ID_1));
run;

proc sort data=rule2assist;
by rule2assist;
run;

proc sort data=rule2assist out=rule2tmp(keep=rule2assist);
by rule2assist;
run;

data rule2tmp;
set rule2tmp;
if first.rule2assist eq 1 AND last.rule2assist eq 1 then delete;
by rule2assist;
run;

proc sort data=rule2tmp nodup;
by rule2assist;
run;

proc sql;
create table rule2 as
select event1, event2, ID_1, ID_2, a.rule2assist
from rule2assist a, rule2tmp b
where a.rule2assist eq b.rule2assist and ID_2 eq " "
;
quit;

 

/*Code for Keeping observations that meet ...this doesn't work*/

data rule3assist;
set have;
rule3assist = catx('-', trim(event1), trim(event2),trim(ID_1), trim(ID_2));
run;

proc sort data=rule3assist;
by rule3assist;
run;

proc sort data=rule3assist out=rule3tmp(keep=rule3assist);
by rule3assist;
run;

data rule3tmp;
set rule3tmp;
if first.rule3assist eq 1 AND last.rule3assist eq 1 ;
by rule3assist;
run;

proc sort data=rule3tmp nodup;
by rule3assist;
run;

proc sql;
create table rule3 as
select distinct*
from rule3assist a, rule3tmp b
group by a.event2
where a.rule3assist eq b.rule3assist and ID_2 eq " " and ID_1 eq " "
;
quit;

 


/*Code for Keeping observations that meet Rule 4*/

data rule4assist;
set have;
rule4assist = catx('-', trim(event1), trim(ID_2));
run;

proc sort data=rule4assist;
by rule4assist;
run;

proc sort data=rule4assist out=rule4tmp(keep=rule4assist);
by rule4assist;
run;

data rule4tmp;
set rule4tmp;
if first.rule4assist eq 1 AND last.rule4assist eq 1 then delete;
by rule4assist;
run;

proc sort data=rule4tmp nodup;
by rule4assist;
run;

proc sql;
create table rule4 as
select event1, event2, ID_1, ID_2, a.rule4assist
from rule4assist a, rule4tmp b
where a.rule4assist eq b.rule4assist and ID_1 eq " "
;
quit;

 


data want;
set rule1 rule2 rule3 rule4
run;

 

 

 

sas-innovate-white.png

Special offer for SAS Communities members

Save $250 on SAS Innovate and get a free advance copy of the new SAS For Dummies book! Use the code "SASforDummies" to register. Don't miss out, May 6-9, in Orlando, Florida.

 

View the full agenda.

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 8 replies
  • 1645 views
  • 0 likes
  • 4 in conversation