Help using Base SAS procedures

Capture duplicate records based on three different conditon

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 6
Accepted Solution

Capture duplicate records based on three different conditon

Hello Everyone!

As per below sample data, I need to capture all the study with only duplicate event. (e.g.Hemorge for study CT001).

After that I retrieve only those records which contain events with exactly same gender or  exactly same risk_factor or  exactly same values for five variables (freq, SD, median, mode, covariance)

For example for same gender with all duplicate events, data will be like below

STUDYSEQEVENTGENDERRISK_FACTORFREQSDMEDIANMODECOVARIANCE
CT0034Pregnancy31111101
CT0035Pregnancy3201101

For same risk_factor with duplicate events

STUDYSEQEVENTGENDERRISK_FACTORFREQSDMEDIANMODECOVARIANCE
CT0061Oedema0010110
CT0062Oedema1011110
CT0063Oedema1001110

For same five variables (freq, SD, median, mode, covariance) with exactly same value for duplicate events

STUDYSEQEVENTGENDERRISK_FACTORFREQSDMEDIANMODECOVARIANCE
CT0051Skinlson1100000
CT0052Skinlson2100000
CT0053Skinlson1000000

data clinical;

input study $ seq event $ gender risk_factor freq  SD  median  mode  covariance;

cards;

CT001 1 Hemorge 1 1 1 1 1 1 1

CT001 2 Hemorge 1 2 0 3 1 0 11

CT001 3 Hemorge 2 2 0 1 1 0 11

CT001 8 Hemoglobin 2 2 0 1 1 0 11

CT001 4 SBP 2 3 1 2 1 0 11

CT001 5 SBP 1 1 1 2 1 0 11

CT001 6 Hemorge 3 3 1 1 1 1 1

CT002 2 Gastro 0 1 1 1 1 1 1

CT002 1 Gastro 1 0 2 1 1 11 1

CT002 3 Gastro 1 0 3 1 1 0 1

CT003 1 Malignt 1 0 1 1 1 0 1

CT003 2 Malignt 0 0 1 1 1 0 1

CT003 3 Malignt 3 1 11 1 1 0 1

CT003 4 Pregnant 3 1 11 1 1 0 1

CT003 5 Pregnant 3 2 0 1 1 0 1

CT004 4 Pregnant 3 1 11 1 1 0 1

CT005 1 Skinlson  1 1 0 0 0 0 0

CT005 2 Skinlson  2 1 0 0 0 0 0

CT005 3 Skinlson  1 0 0 0 0 0 0

CT006 1 Oedema 0 0 1 0 1 1 0

CT006 2 Oedema 1 0 1 1 1 1 0

CT006 4 STD 1 0 1 1 1 1 0

CT006 3 Oedema 1 0 0 1 1 1 0

;

run;

Output

STUDYSEQEVENTGENDERRISK_FACTORFREQSDMEDIANMODECOVARIANCE
CT0061Oedema0010110
CT0062Oedema1011110
CT0063Oedema1001110
CT0034Pregnant31111101
CT0035Pregnant3201101
CT0051Skinlson1100000
CT0052Skinlson2100000
CT0053Skinlson1000000
CT0014SBP 23121011
CT0015SBP 11121011

Thanks in advance

dataset value change by tiki


Accepted Solutions
Solution
‎06-20-2015 12:27 AM
Super User
Posts: 10,020

Re: Capture duplicate records based on three different conditon

SAS Standard SQL also could achieve your Oracle's Logic .

Code: Program

data clinical;
input study $ seq event $ gender risk_factor freq  SD  median  mode  covariance;
cards;
CT001 1 Hemorge 1 1 1 1 1 1 1
CT001 2 Hemorge 1 2 0 3 1 0 11
CT001 3 Hemorge 2 2 0 1 1 0 11
CT001 8 Hemoglobin 2 2 0 1 1 0 11
CT001 4 SBP 2 3 1 2 1 0 11
CT001 5 SBP 1 1 1 2 1 0 11
CT001 6 Hemorge 3 3 1 1 1 1 1
CT002 2 Gastro 0 1 1 1 1 1 1
CT002 1 Gastro 1 0 2 1 1 11 1
CT002 3 Gastro 1 0 3 1 1 0 1
CT003 1 Malignt 1 0 1 1 1 0 1
CT003 2 Malignt 0 0 1 1 1 0 1
CT003 3 Malignt 3 1 11 1 1 0 1
CT003 4 Pregnant 3 1 11 1 1 0 1
CT003 5 Pregnant 3 2 0 1 1 0 1
CT004 4 Pregnant 3 1 11 1 1 0 1
CT005 1 Skinlson 1 1 0 0 0 0 0
CT005 2 Skinlson 2 1 0 0 0 0 0
CT005 3 Skinlson 1 0 0 0 0 0 0
CT006 1 Oedema 0 0 1 0 1 1 0
CT006 2 Oedema 1 0 1 1 1 1 0
CT006 4 STD 1 0 1 1 1 1 0
CT006 3 Oedema 1 0 0 1 1 1 0
;
run;
proc sql;
create table want as
  select *
   from clinical
   group by study, event
   having count(*) gt 1 and (
   count(distinct GENDER)=1 or
   count(distinct RISK_FACTOR)=1 or
   count(distinct cats(FREQ,SD,MEDIAN,MODE,COVARIANCE))=1
);
quit;

Xia Keshan

View solution in original post


All Replies
Respected Advisor
Posts: 4,919

Re: Capture duplicate records based on three different conditon

Your output doesn't match your input. For example, your output shows two records with negative FREQ values but there aren't any in your input. Please fix.

PG

PG
Occasional Contributor
Posts: 6

Re: Capture duplicate records based on three different conditon

Apology PG.

I made a mistake to copy dataset. I updated the table along with output. Please check once.

The problem is that partition by clause is not supported by SAS. The below SQL approach is taken.

select study,seq,event,gender,risk_factor,freq,SD,median,mode,covariance

  from (select study,seq,event,gender,risk_factor,freq,SD,median,mode,covariance,

               count(distinct gender) over (partition by event) cnt1,

               count(distinct risk_factor) over (partition by event) cnt2,

               count(distinct freq||'#'||SD||'#'||median||'#'||mode||'#'||covariance) over (partition by event) cnt3,

               count(*) over(partition by event) cnt

          from trial_study)

where cnt > 1

   and 1 in (cnt1,cnt2,cnt3)

Super User
Posts: 10,020

Re: Capture duplicate records based on three different conditon

Code: Program

data clinical;
input study $ seq event $ gender risk_factor freq  SD  median  mode  covariance;
cards;
CT001 1 Hemorge 1 1 1 1 1 1 1
CT001 2 Hemorge 1 2 0 3 1 0 11
CT001 3 Hemorge 2 2 0 1 1 0 11
CT001 8 Hemoglobin 2 2 0 1 1 0 11
CT001 4 SBP 2 3 1 2 1 0 11
CT001 5 SBP 1 1 1 2 1 0 11
CT001 6 Hemorge 3 3 1 1 1 1 1
CT002 2 Gastro 0 1 1 1 1 1 1
CT002 1 Gastro 1 0 2 1 1 11 1
CT002 3 Gastro 1 0 3 1 1 0 1
CT003 1 Malignt 1 0 1 1 1 0 1
CT003 2 Malignt 0 0 1 1 1 0 1
CT003 3 Malignt 3 1 11 1 1 0 1
CT003 4 Pregnant 3 1 11 1 1 0 1
CT003 5 Pregnant 3 2 0 1 1 0 1
CT004 4 Pregnant 3 1 11 1 1 0 1
CT005 1 Skinlson 1 1 0 0 0 0 0
CT005 2 Skinlson 2 1 0 0 0 0 0
CT005 3 Skinlson 1 0 0 0 0 0 0
CT006 1 Oedema 0 0 1 0 1 1 0
CT006 2 Oedema 1 0 1 1 1 1 0
CT006 4 STD 1 0 1 1 1 1 0
CT006 3 Oedema 1 0 0 1 1 1 0
;
run;
proc sql;
create table want as
  select *
   from clinical
   group by study, event
   having count(*) gt 1 and (
   range(GENDER)=0 or
   range(RISK_FACTOR)=0 or
(range(FREQ)=0 and range(SD)=0 and range(MEDIAN)=0 and range(MODE)=0 and range(COVARIANCE)=0)
);
quit;

Occasional Contributor
Posts: 6

Re: Capture duplicate records based on three different conditon

Thanks keshan. But the value of variables are not fixed. It changes randomly. I can not enter zero or any fixed value. Is there any alternative approach to find sample results. Advance thanks

Super User
Posts: 10,020

Re: Capture duplicate records based on three different conditon

Try my code on your sample data. If there were any problem , tell me .

Super User
Posts: 11,343

Re: Capture duplicate records based on three different conditon

Why aren't CT001 seq 2 and 3 not in the result? They have the same risk_factor.

Super User
Posts: 11,343

Re: Capture duplicate records based on three different conditon

Look up what the RANGE function does. It is not looking at specific values or your data though it does assume they will be numeric.

Occasional Contributor
Posts: 6

Re: Capture duplicate records based on three different conditon

Thanks Keshan for providing the logic. It is working fine.

Suppose the database contain value like below (other than zero). In that case, we can not fix the program with zero. We need to use some other approach.

In oracle, I resolve it through below logic using function.

select * from

(select study,seq,event,gender,risk_factor,freq,SD,median,mode,covariance,

               count(distinct freq||'#'||SD||'#'||median||'#'||mode||'#'||covariance) over (partition by event) cnt3,

               count(*) over(partition by event) cnt

          from trial_study)

where cnt > 1

   and cnt3=1

FREQSDMEDIANMODECOVARIANCE
1111111111
1111111111
4444444444
4444444444
-2-2-2-2-2
-2-2-2-2-2

Dear ballarw,

You are correct. yes, those two record should be captured by the program. I keep only few records to display as output. Sorry for the confusion

Solution
‎06-20-2015 12:27 AM
Super User
Posts: 10,020

Re: Capture duplicate records based on three different conditon

SAS Standard SQL also could achieve your Oracle's Logic .

Code: Program

data clinical;
input study $ seq event $ gender risk_factor freq  SD  median  mode  covariance;
cards;
CT001 1 Hemorge 1 1 1 1 1 1 1
CT001 2 Hemorge 1 2 0 3 1 0 11
CT001 3 Hemorge 2 2 0 1 1 0 11
CT001 8 Hemoglobin 2 2 0 1 1 0 11
CT001 4 SBP 2 3 1 2 1 0 11
CT001 5 SBP 1 1 1 2 1 0 11
CT001 6 Hemorge 3 3 1 1 1 1 1
CT002 2 Gastro 0 1 1 1 1 1 1
CT002 1 Gastro 1 0 2 1 1 11 1
CT002 3 Gastro 1 0 3 1 1 0 1
CT003 1 Malignt 1 0 1 1 1 0 1
CT003 2 Malignt 0 0 1 1 1 0 1
CT003 3 Malignt 3 1 11 1 1 0 1
CT003 4 Pregnant 3 1 11 1 1 0 1
CT003 5 Pregnant 3 2 0 1 1 0 1
CT004 4 Pregnant 3 1 11 1 1 0 1
CT005 1 Skinlson 1 1 0 0 0 0 0
CT005 2 Skinlson 2 1 0 0 0 0 0
CT005 3 Skinlson 1 0 0 0 0 0 0
CT006 1 Oedema 0 0 1 0 1 1 0
CT006 2 Oedema 1 0 1 1 1 1 0
CT006 4 STD 1 0 1 1 1 1 0
CT006 3 Oedema 1 0 0 1 1 1 0
;
run;
proc sql;
create table want as
  select *
   from clinical
   group by study, event
   having count(*) gt 1 and (
   count(distinct GENDER)=1 or
   count(distinct RISK_FACTOR)=1 or
   count(distinct cats(FREQ,SD,MEDIAN,MODE,COVARIANCE))=1
);
quit;

Xia Keshan

Occasional Contributor
Posts: 6

Re: Capture duplicate records based on three different conditon

Thanks Keshan for your great help

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 10 replies
  • 452 views
  • 0 likes
  • 4 in conversation