Capture duplicate records based on three different conditon

Solved
Occasional Contributor
Posts: 7

Capture duplicate records based on three different conditon

Hello Everyone!

As per below sample data, I need to capture all the study with only duplicate event. (e.g.Hemorge for study CT001).

After that I retrieve only those records which contain events with exactly same gender or  exactly same risk_factor or  exactly same values for five variables (freq, SD, median, mode, covariance)

For example for same gender with all duplicate events, data will be like below

 STUDY SEQ EVENT GENDER RISK_FACTOR FREQ SD MEDIAN MODE COVARIANCE CT003 4 Pregnancy 3 1 11 1 1 0 1 CT003 5 Pregnancy 3 2 0 1 1 0 1

For same risk_factor with duplicate events

 STUDY SEQ EVENT GENDER RISK_FACTOR FREQ SD MEDIAN MODE COVARIANCE CT006 1 Oedema 0 0 1 0 1 1 0 CT006 2 Oedema 1 0 1 1 1 1 0 CT006 3 Oedema 1 0 0 1 1 1 0

For same five variables (freq, SD, median, mode, covariance) with exactly same value for duplicate events

 STUDY SEQ EVENT GENDER RISK_FACTOR FREQ SD MEDIAN MODE COVARIANCE CT005 1 Skinlson 1 1 0 0 0 0 0 CT005 2 Skinlson 2 1 0 0 0 0 0 CT005 3 Skinlson 1 0 0 0 0 0 0

data clinical;

input study \$ seq event \$ gender risk_factor freq  SD  median  mode  covariance;

cards;

CT001 1 Hemorge 1 1 1 1 1 1 1

CT001 2 Hemorge 1 2 0 3 1 0 11

CT001 3 Hemorge 2 2 0 1 1 0 11

CT001 8 Hemoglobin 2 2 0 1 1 0 11

CT001 4 SBP 2 3 1 2 1 0 11

CT001 5 SBP 1 1 1 2 1 0 11

CT001 6 Hemorge 3 3 1 1 1 1 1

CT002 2 Gastro 0 1 1 1 1 1 1

CT002 1 Gastro 1 0 2 1 1 11 1

CT002 3 Gastro 1 0 3 1 1 0 1

CT003 1 Malignt 1 0 1 1 1 0 1

CT003 2 Malignt 0 0 1 1 1 0 1

CT003 3 Malignt 3 1 11 1 1 0 1

CT003 4 Pregnant 3 1 11 1 1 0 1

CT003 5 Pregnant 3 2 0 1 1 0 1

CT004 4 Pregnant 3 1 11 1 1 0 1

CT005 1 Skinlson  1 1 0 0 0 0 0

CT005 2 Skinlson  2 1 0 0 0 0 0

CT005 3 Skinlson  1 0 0 0 0 0 0

CT006 1 Oedema 0 0 1 0 1 1 0

CT006 2 Oedema 1 0 1 1 1 1 0

CT006 4 STD 1 0 1 1 1 1 0

CT006 3 Oedema 1 0 0 1 1 1 0

;

run;

Output

 STUDY SEQ EVENT GENDER RISK_FACTOR FREQ SD MEDIAN MODE COVARIANCE CT006 1 Oedema 0 0 1 0 1 1 0 CT006 2 Oedema 1 0 1 1 1 1 0 CT006 3 Oedema 1 0 0 1 1 1 0 CT003 4 Pregnant 3 1 11 1 1 0 1 CT003 5 Pregnant 3 2 0 1 1 0 1 CT005 1 Skinlson 1 1 0 0 0 0 0 CT005 2 Skinlson 2 1 0 0 0 0 0 CT005 3 Skinlson 1 0 0 0 0 0 0 CT001 4 SBP 2 3 1 2 1 0 11 CT001 5 SBP 1 1 1 2 1 0 11

dataset value change by tiki

Accepted Solutions
Solution
‎06-20-2015 12:27 AM
Super User
Posts: 10,784

Re: Capture duplicate records based on three different conditon

SAS Standard SQL also could achieve your Oracle's Logic .

Code: Program

data clinical;
input study \$ seq event \$ gender risk_factor freq  SD  median  mode  covariance;
cards;
CT001 1 Hemorge 1 1 1 1 1 1 1
CT001 2 Hemorge 1 2 0 3 1 0 11
CT001 3 Hemorge 2 2 0 1 1 0 11
CT001 8 Hemoglobin 2 2 0 1 1 0 11
CT001 4 SBP 2 3 1 2 1 0 11
CT001 5 SBP 1 1 1 2 1 0 11
CT001 6 Hemorge 3 3 1 1 1 1 1
CT002 2 Gastro 0 1 1 1 1 1 1
CT002 1 Gastro 1 0 2 1 1 11 1
CT002 3 Gastro 1 0 3 1 1 0 1
CT003 1 Malignt 1 0 1 1 1 0 1
CT003 2 Malignt 0 0 1 1 1 0 1
CT003 3 Malignt 3 1 11 1 1 0 1
CT003 4 Pregnant 3 1 11 1 1 0 1
CT003 5 Pregnant 3 2 0 1 1 0 1
CT004 4 Pregnant 3 1 11 1 1 0 1
CT005 1 Skinlson 1 1 0 0 0 0 0
CT005 2 Skinlson 2 1 0 0 0 0 0
CT005 3 Skinlson 1 0 0 0 0 0 0
CT006 1 Oedema 0 0 1 0 1 1 0
CT006 2 Oedema 1 0 1 1 1 1 0
CT006 4 STD 1 0 1 1 1 1 0
CT006 3 Oedema 1 0 0 1 1 1 0
;
run;
proc sql;
create table want as
select *
from clinical
group by study, event
having count(*) gt 1 and (
count(distinct GENDER)=1 or
count(distinct RISK_FACTOR)=1 or
count(distinct cats(FREQ,SD,MEDIAN,MODE,COVARIANCE))=1
);
quit;

Xia Keshan

All Replies
Posts: 5,540

Re: Capture duplicate records based on three different conditon

PG

PG
Occasional Contributor
Posts: 7

Re: Capture duplicate records based on three different conditon

Apology PG.

I made a mistake to copy dataset. I updated the table along with output. Please check once.

The problem is that partition by clause is not supported by SAS. The below SQL approach is taken.

select study,seq,event,gender,risk_factor,freq,SD,median,mode,covariance

from (select study,seq,event,gender,risk_factor,freq,SD,median,mode,covariance,

count(distinct gender) over (partition by event) cnt1,

count(distinct risk_factor) over (partition by event) cnt2,

count(distinct freq||'#'||SD||'#'||median||'#'||mode||'#'||covariance) over (partition by event) cnt3,

count(*) over(partition by event) cnt

from trial_study)

where cnt > 1

and 1 in (cnt1,cnt2,cnt3)

Super User
Posts: 10,784

Re: Capture duplicate records based on three different conditon

Code: Program

data clinical;
input study \$ seq event \$ gender risk_factor freq  SD  median  mode  covariance;
cards;
CT001 1 Hemorge 1 1 1 1 1 1 1
CT001 2 Hemorge 1 2 0 3 1 0 11
CT001 3 Hemorge 2 2 0 1 1 0 11
CT001 8 Hemoglobin 2 2 0 1 1 0 11
CT001 4 SBP 2 3 1 2 1 0 11
CT001 5 SBP 1 1 1 2 1 0 11
CT001 6 Hemorge 3 3 1 1 1 1 1
CT002 2 Gastro 0 1 1 1 1 1 1
CT002 1 Gastro 1 0 2 1 1 11 1
CT002 3 Gastro 1 0 3 1 1 0 1
CT003 1 Malignt 1 0 1 1 1 0 1
CT003 2 Malignt 0 0 1 1 1 0 1
CT003 3 Malignt 3 1 11 1 1 0 1
CT003 4 Pregnant 3 1 11 1 1 0 1
CT003 5 Pregnant 3 2 0 1 1 0 1
CT004 4 Pregnant 3 1 11 1 1 0 1
CT005 1 Skinlson 1 1 0 0 0 0 0
CT005 2 Skinlson 2 1 0 0 0 0 0
CT005 3 Skinlson 1 0 0 0 0 0 0
CT006 1 Oedema 0 0 1 0 1 1 0
CT006 2 Oedema 1 0 1 1 1 1 0
CT006 4 STD 1 0 1 1 1 1 0
CT006 3 Oedema 1 0 0 1 1 1 0
;
run;
proc sql;
create table want as
select *
from clinical
group by study, event
having count(*) gt 1 and (
range(GENDER)=0 or
range(RISK_FACTOR)=0 or
(range(FREQ)=0 and range(SD)=0 and range(MEDIAN)=0 and range(MODE)=0 and range(COVARIANCE)=0)
);
quit;

Occasional Contributor
Posts: 7

Re: Capture duplicate records based on three different conditon

Thanks keshan. But the value of variables are not fixed. It changes randomly. I can not enter zero or any fixed value. Is there any alternative approach to find sample results. Advance thanks

Super User
Posts: 10,784

Re: Capture duplicate records based on three different conditon

Try my code on your sample data. If there were any problem , tell me .

Super User
Posts: 13,583

Re: Capture duplicate records based on three different conditon

Why aren't CT001 seq 2 and 3 not in the result? They have the same risk_factor.

Super User
Posts: 13,583

Re: Capture duplicate records based on three different conditon

Look up what the RANGE function does. It is not looking at specific values or your data though it does assume they will be numeric.

Occasional Contributor
Posts: 7

Re: Capture duplicate records based on three different conditon

Thanks Keshan for providing the logic. It is working fine.

Suppose the database contain value like below (other than zero). In that case, we can not fix the program with zero. We need to use some other approach.

In oracle, I resolve it through below logic using function.

select * from

(select study,seq,event,gender,risk_factor,freq,SD,median,mode,covariance,

count(distinct freq||'#'||SD||'#'||median||'#'||mode||'#'||covariance) over (partition by event) cnt3,

count(*) over(partition by event) cnt

from trial_study)

where cnt > 1

and cnt3=1

 FREQ SD MEDIAN MODE COVARIANCE 11 11 11 11 11 11 11 11 11 11 44 44 44 44 44 44 44 44 44 44 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2

Dear ballarw,

You are correct. yes, those two record should be captured by the program. I keep only few records to display as output. Sorry for the confusion

Solution
‎06-20-2015 12:27 AM
Super User
Posts: 10,784

Re: Capture duplicate records based on three different conditon

SAS Standard SQL also could achieve your Oracle's Logic .

Code: Program

data clinical;
input study \$ seq event \$ gender risk_factor freq  SD  median  mode  covariance;
cards;
CT001 1 Hemorge 1 1 1 1 1 1 1
CT001 2 Hemorge 1 2 0 3 1 0 11
CT001 3 Hemorge 2 2 0 1 1 0 11
CT001 8 Hemoglobin 2 2 0 1 1 0 11
CT001 4 SBP 2 3 1 2 1 0 11
CT001 5 SBP 1 1 1 2 1 0 11
CT001 6 Hemorge 3 3 1 1 1 1 1
CT002 2 Gastro 0 1 1 1 1 1 1
CT002 1 Gastro 1 0 2 1 1 11 1
CT002 3 Gastro 1 0 3 1 1 0 1
CT003 1 Malignt 1 0 1 1 1 0 1
CT003 2 Malignt 0 0 1 1 1 0 1
CT003 3 Malignt 3 1 11 1 1 0 1
CT003 4 Pregnant 3 1 11 1 1 0 1
CT003 5 Pregnant 3 2 0 1 1 0 1
CT004 4 Pregnant 3 1 11 1 1 0 1
CT005 1 Skinlson 1 1 0 0 0 0 0
CT005 2 Skinlson 2 1 0 0 0 0 0
CT005 3 Skinlson 1 0 0 0 0 0 0
CT006 1 Oedema 0 0 1 0 1 1 0
CT006 2 Oedema 1 0 1 1 1 1 0
CT006 4 STD 1 0 1 1 1 1 0
CT006 3 Oedema 1 0 0 1 1 1 0
;
run;
proc sql;
create table want as
select *
from clinical
group by study, event
having count(*) gt 1 and (
count(distinct GENDER)=1 or
count(distinct RISK_FACTOR)=1 or
count(distinct cats(FREQ,SD,MEDIAN,MODE,COVARIANCE))=1
);
quit;

Xia Keshan

Occasional Contributor
Posts: 7

Re: Capture duplicate records based on three different conditon

Thanks Keshan for your great help

🔒 This topic is solved and locked.