BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Tanja
Calcite | Level 5

Hello Everyone!

As per below sample data, I need to capture all the study with only duplicate event. (e.g.Hemorge for study CT001).

After that I retrieve only those records which contain events with exactly same gender or  exactly same risk_factor or  exactly same values for five variables (freq, SD, median, mode, covariance)

For example for same gender with all duplicate events, data will be like below

STUDYSEQEVENTGENDERRISK_FACTORFREQSDMEDIANMODECOVARIANCE
CT0034Pregnancy31111101
CT0035Pregnancy3201101

For same risk_factor with duplicate events

STUDYSEQEVENTGENDERRISK_FACTORFREQSDMEDIANMODECOVARIANCE
CT0061Oedema0010110
CT0062Oedema1011110
CT0063Oedema1001110

For same five variables (freq, SD, median, mode, covariance) with exactly same value for duplicate events

STUDYSEQEVENTGENDERRISK_FACTORFREQSDMEDIANMODECOVARIANCE
CT0051Skinlson1100000
CT0052Skinlson2100000
CT0053Skinlson1000000

data clinical;

input study $ seq event $ gender risk_factor freq  SD  median  mode  covariance;

cards;

CT001 1 Hemorge 1 1 1 1 1 1 1

CT001 2 Hemorge 1 2 0 3 1 0 11

CT001 3 Hemorge 2 2 0 1 1 0 11

CT001 8 Hemoglobin 2 2 0 1 1 0 11

CT001 4 SBP 2 3 1 2 1 0 11

CT001 5 SBP 1 1 1 2 1 0 11

CT001 6 Hemorge 3 3 1 1 1 1 1

CT002 2 Gastro 0 1 1 1 1 1 1

CT002 1 Gastro 1 0 2 1 1 11 1

CT002 3 Gastro 1 0 3 1 1 0 1

CT003 1 Malignt 1 0 1 1 1 0 1

CT003 2 Malignt 0 0 1 1 1 0 1

CT003 3 Malignt 3 1 11 1 1 0 1

CT003 4 Pregnant 3 1 11 1 1 0 1

CT003 5 Pregnant 3 2 0 1 1 0 1

CT004 4 Pregnant 3 1 11 1 1 0 1

CT005 1 Skinlson  1 1 0 0 0 0 0

CT005 2 Skinlson  2 1 0 0 0 0 0

CT005 3 Skinlson  1 0 0 0 0 0 0

CT006 1 Oedema 0 0 1 0 1 1 0

CT006 2 Oedema 1 0 1 1 1 1 0

CT006 4 STD 1 0 1 1 1 1 0

CT006 3 Oedema 1 0 0 1 1 1 0

;

run;

Output

STUDYSEQEVENTGENDERRISK_FACTORFREQSDMEDIANMODECOVARIANCE
CT0061Oedema0010110
CT0062Oedema1011110
CT0063Oedema1001110
CT0034Pregnant31111101
CT0035Pregnant3201101
CT0051Skinlson1100000
CT0052Skinlson2100000
CT0053Skinlson1000000
CT0014SBP 23121011
CT0015SBP 11121011

Thanks in advance

dataset value change by tiki

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User

SAS Standard SQL also could achieve your Oracle's Logic .

Code: Program

data clinical;
input study $ seq event $ gender risk_factor freq  SD  median  mode  covariance;
cards;
CT001 1 Hemorge 1 1 1 1 1 1 1
CT001 2 Hemorge 1 2 0 3 1 0 11
CT001 3 Hemorge 2 2 0 1 1 0 11
CT001 8 Hemoglobin 2 2 0 1 1 0 11
CT001 4 SBP 2 3 1 2 1 0 11
CT001 5 SBP 1 1 1 2 1 0 11
CT001 6 Hemorge 3 3 1 1 1 1 1
CT002 2 Gastro 0 1 1 1 1 1 1
CT002 1 Gastro 1 0 2 1 1 11 1
CT002 3 Gastro 1 0 3 1 1 0 1
CT003 1 Malignt 1 0 1 1 1 0 1
CT003 2 Malignt 0 0 1 1 1 0 1
CT003 3 Malignt 3 1 11 1 1 0 1
CT003 4 Pregnant 3 1 11 1 1 0 1
CT003 5 Pregnant 3 2 0 1 1 0 1
CT004 4 Pregnant 3 1 11 1 1 0 1
CT005 1 Skinlson 1 1 0 0 0 0 0
CT005 2 Skinlson 2 1 0 0 0 0 0
CT005 3 Skinlson 1 0 0 0 0 0 0
CT006 1 Oedema 0 0 1 0 1 1 0
CT006 2 Oedema 1 0 1 1 1 1 0
CT006 4 STD 1 0 1 1 1 1 0
CT006 3 Oedema 1 0 0 1 1 1 0
;
run;
proc sql;
create table want as
  select *
   from clinical
   group by study, event
   having count(*) gt 1 and (
   count(distinct GENDER)=1 or
   count(distinct RISK_FACTOR)=1 or
   count(distinct cats(FREQ,SD,MEDIAN,MODE,COVARIANCE))=1
);
quit;

Xia Keshan

View solution in original post

10 REPLIES 10
PGStats
Opal | Level 21

Your output doesn't match your input. For example, your output shows two records with negative FREQ values but there aren't any in your input. Please fix.

PG

PG
Tanja
Calcite | Level 5

Apology PG.

I made a mistake to copy dataset. I updated the table along with output. Please check once.

The problem is that partition by clause is not supported by SAS. The below SQL approach is taken.

select study,seq,event,gender,risk_factor,freq,SD,median,mode,covariance

  from (select study,seq,event,gender,risk_factor,freq,SD,median,mode,covariance,

               count(distinct gender) over (partition by event) cnt1,

               count(distinct risk_factor) over (partition by event) cnt2,

               count(distinct freq||'#'||SD||'#'||median||'#'||mode||'#'||covariance) over (partition by event) cnt3,

               count(*) over(partition by event) cnt

          from trial_study)

where cnt > 1

   and 1 in (cnt1,cnt2,cnt3)

Ksharp
Super User

Code: Program

data clinical;
input study $ seq event $ gender risk_factor freq  SD  median  mode  covariance;
cards;
CT001 1 Hemorge 1 1 1 1 1 1 1
CT001 2 Hemorge 1 2 0 3 1 0 11
CT001 3 Hemorge 2 2 0 1 1 0 11
CT001 8 Hemoglobin 2 2 0 1 1 0 11
CT001 4 SBP 2 3 1 2 1 0 11
CT001 5 SBP 1 1 1 2 1 0 11
CT001 6 Hemorge 3 3 1 1 1 1 1
CT002 2 Gastro 0 1 1 1 1 1 1
CT002 1 Gastro 1 0 2 1 1 11 1
CT002 3 Gastro 1 0 3 1 1 0 1
CT003 1 Malignt 1 0 1 1 1 0 1
CT003 2 Malignt 0 0 1 1 1 0 1
CT003 3 Malignt 3 1 11 1 1 0 1
CT003 4 Pregnant 3 1 11 1 1 0 1
CT003 5 Pregnant 3 2 0 1 1 0 1
CT004 4 Pregnant 3 1 11 1 1 0 1
CT005 1 Skinlson 1 1 0 0 0 0 0
CT005 2 Skinlson 2 1 0 0 0 0 0
CT005 3 Skinlson 1 0 0 0 0 0 0
CT006 1 Oedema 0 0 1 0 1 1 0
CT006 2 Oedema 1 0 1 1 1 1 0
CT006 4 STD 1 0 1 1 1 1 0
CT006 3 Oedema 1 0 0 1 1 1 0
;
run;
proc sql;
create table want as
  select *
   from clinical
   group by study, event
   having count(*) gt 1 and (
   range(GENDER)=0 or
   range(RISK_FACTOR)=0 or
(range(FREQ)=0 and range(SD)=0 and range(MEDIAN)=0 and range(MODE)=0 and range(COVARIANCE)=0)
);
quit;

Tanja
Calcite | Level 5

Thanks keshan. But the value of variables are not fixed. It changes randomly. I can not enter zero or any fixed value. Is there any alternative approach to find sample results. Advance thanks

Ksharp
Super User

Try my code on your sample data. If there were any problem , tell me .

ballardw
Super User

Why aren't CT001 seq 2 and 3 not in the result? They have the same risk_factor.

ballardw
Super User

Look up what the RANGE function does. It is not looking at specific values or your data though it does assume they will be numeric.

Tanja
Calcite | Level 5

Thanks Keshan for providing the logic. It is working fine.

Suppose the database contain value like below (other than zero). In that case, we can not fix the program with zero. We need to use some other approach.

In oracle, I resolve it through below logic using function.

select * from

(select study,seq,event,gender,risk_factor,freq,SD,median,mode,covariance,

               count(distinct freq||'#'||SD||'#'||median||'#'||mode||'#'||covariance) over (partition by event) cnt3,

               count(*) over(partition by event) cnt

          from trial_study)

where cnt > 1

   and cnt3=1

FREQSDMEDIANMODECOVARIANCE
1111111111
1111111111
4444444444
4444444444
-2-2-2-2-2
-2-2-2-2-2

Dear ballarw,

You are correct. yes, those two record should be captured by the program. I keep only few records to display as output. Sorry for the confusion

Ksharp
Super User

SAS Standard SQL also could achieve your Oracle's Logic .

Code: Program

data clinical;
input study $ seq event $ gender risk_factor freq  SD  median  mode  covariance;
cards;
CT001 1 Hemorge 1 1 1 1 1 1 1
CT001 2 Hemorge 1 2 0 3 1 0 11
CT001 3 Hemorge 2 2 0 1 1 0 11
CT001 8 Hemoglobin 2 2 0 1 1 0 11
CT001 4 SBP 2 3 1 2 1 0 11
CT001 5 SBP 1 1 1 2 1 0 11
CT001 6 Hemorge 3 3 1 1 1 1 1
CT002 2 Gastro 0 1 1 1 1 1 1
CT002 1 Gastro 1 0 2 1 1 11 1
CT002 3 Gastro 1 0 3 1 1 0 1
CT003 1 Malignt 1 0 1 1 1 0 1
CT003 2 Malignt 0 0 1 1 1 0 1
CT003 3 Malignt 3 1 11 1 1 0 1
CT003 4 Pregnant 3 1 11 1 1 0 1
CT003 5 Pregnant 3 2 0 1 1 0 1
CT004 4 Pregnant 3 1 11 1 1 0 1
CT005 1 Skinlson 1 1 0 0 0 0 0
CT005 2 Skinlson 2 1 0 0 0 0 0
CT005 3 Skinlson 1 0 0 0 0 0 0
CT006 1 Oedema 0 0 1 0 1 1 0
CT006 2 Oedema 1 0 1 1 1 1 0
CT006 4 STD 1 0 1 1 1 1 0
CT006 3 Oedema 1 0 0 1 1 1 0
;
run;
proc sql;
create table want as
  select *
   from clinical
   group by study, event
   having count(*) gt 1 and (
   count(distinct GENDER)=1 or
   count(distinct RISK_FACTOR)=1 or
   count(distinct cats(FREQ,SD,MEDIAN,MODE,COVARIANCE))=1
);
quit;

Xia Keshan

Tanja
Calcite | Level 5

Thanks Keshan for your great help

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 10 replies
  • 2672 views
  • 0 likes
  • 4 in conversation