Hello Everyone!
As per below sample data, I need to capture all the study with only duplicate event. (e.g.Hemorge for study CT001).
After that I retrieve only those records which contain events with exactly same gender or exactly same risk_factor or exactly same values for five variables (freq, SD, median, mode, covariance)
For example for same gender with all duplicate events, data will be like below
STUDY | SEQ | EVENT | GENDER | RISK_FACTOR | FREQ | SD | MEDIAN | MODE | COVARIANCE |
CT003 | 4 | Pregnancy | 3 | 1 | 11 | 1 | 1 | 0 | 1 |
CT003 | 5 | Pregnancy | 3 | 2 | 0 | 1 | 1 | 0 | 1 |
For same risk_factor with duplicate events
STUDY | SEQ | EVENT | GENDER | RISK_FACTOR | FREQ | SD | MEDIAN | MODE | COVARIANCE |
CT006 | 1 | Oedema | 0 | 0 | 1 | 0 | 1 | 1 | 0 |
CT006 | 2 | Oedema | 1 | 0 | 1 | 1 | 1 | 1 | 0 |
CT006 | 3 | Oedema | 1 | 0 | 0 | 1 | 1 | 1 | 0 |
For same five variables (freq, SD, median, mode, covariance) with exactly same value for duplicate events
STUDY | SEQ | EVENT | GENDER | RISK_FACTOR | FREQ | SD | MEDIAN | MODE | COVARIANCE |
CT005 | 1 | Skinlson | 1 | 1 | 0 | 0 | 0 | 0 | 0 |
CT005 | 2 | Skinlson | 2 | 1 | 0 | 0 | 0 | 0 | 0 |
CT005 | 3 | Skinlson | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
data clinical;
input study $ seq event $ gender risk_factor freq SD median mode covariance;
cards;
CT001 1 Hemorge 1 1 1 1 1 1 1
CT001 2 Hemorge 1 2 0 3 1 0 11
CT001 3 Hemorge 2 2 0 1 1 0 11
CT001 8 Hemoglobin 2 2 0 1 1 0 11
CT001 4 SBP 2 3 1 2 1 0 11
CT001 5 SBP 1 1 1 2 1 0 11
CT001 6 Hemorge 3 3 1 1 1 1 1
CT002 2 Gastro 0 1 1 1 1 1 1
CT002 1 Gastro 1 0 2 1 1 11 1
CT002 3 Gastro 1 0 3 1 1 0 1
CT003 1 Malignt 1 0 1 1 1 0 1
CT003 2 Malignt 0 0 1 1 1 0 1
CT003 3 Malignt 3 1 11 1 1 0 1
CT003 4 Pregnant 3 1 11 1 1 0 1
CT003 5 Pregnant 3 2 0 1 1 0 1
CT004 4 Pregnant 3 1 11 1 1 0 1
CT005 1 Skinlson 1 1 0 0 0 0 0
CT005 2 Skinlson 2 1 0 0 0 0 0
CT005 3 Skinlson 1 0 0 0 0 0 0
CT006 1 Oedema 0 0 1 0 1 1 0
CT006 2 Oedema 1 0 1 1 1 1 0
CT006 4 STD 1 0 1 1 1 1 0
CT006 3 Oedema 1 0 0 1 1 1 0
;
run;
Output
STUDY | SEQ | EVENT | GENDER | RISK_FACTOR | FREQ | SD | MEDIAN | MODE | COVARIANCE |
CT006 | 1 | Oedema | 0 | 0 | 1 | 0 | 1 | 1 | 0 |
CT006 | 2 | Oedema | 1 | 0 | 1 | 1 | 1 | 1 | 0 |
CT006 | 3 | Oedema | 1 | 0 | 0 | 1 | 1 | 1 | 0 |
CT003 | 4 | Pregnant | 3 | 1 | 11 | 1 | 1 | 0 | 1 |
CT003 | 5 | Pregnant | 3 | 2 | 0 | 1 | 1 | 0 | 1 |
CT005 | 1 | Skinlson | 1 | 1 | 0 | 0 | 0 | 0 | 0 |
CT005 | 2 | Skinlson | 2 | 1 | 0 | 0 | 0 | 0 | 0 |
CT005 | 3 | Skinlson | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
CT001 | 4 | SBP | 2 | 3 | 1 | 2 | 1 | 0 | 11 |
CT001 | 5 | SBP | 1 | 1 | 1 | 2 | 1 | 0 | 11 |
Thanks in advance
dataset value change by tiki
SAS Standard SQL also could achieve your Oracle's Logic .
data clinical;
input study $ seq event $ gender risk_factor freq SD median mode covariance;
cards;
CT001 1 Hemorge 1 1 1 1 1 1 1
CT001 2 Hemorge 1 2 0 3 1 0 11
CT001 3 Hemorge 2 2 0 1 1 0 11
CT001 8 Hemoglobin 2 2 0 1 1 0 11
CT001 4 SBP 2 3 1 2 1 0 11
CT001 5 SBP 1 1 1 2 1 0 11
CT001 6 Hemorge 3 3 1 1 1 1 1
CT002 2 Gastro 0 1 1 1 1 1 1
CT002 1 Gastro 1 0 2 1 1 11 1
CT002 3 Gastro 1 0 3 1 1 0 1
CT003 1 Malignt 1 0 1 1 1 0 1
CT003 2 Malignt 0 0 1 1 1 0 1
CT003 3 Malignt 3 1 11 1 1 0 1
CT003 4 Pregnant 3 1 11 1 1 0 1
CT003 5 Pregnant 3 2 0 1 1 0 1
CT004 4 Pregnant 3 1 11 1 1 0 1
CT005 1 Skinlson 1 1 0 0 0 0 0
CT005 2 Skinlson 2 1 0 0 0 0 0
CT005 3 Skinlson 1 0 0 0 0 0 0
CT006 1 Oedema 0 0 1 0 1 1 0
CT006 2 Oedema 1 0 1 1 1 1 0
CT006 4 STD 1 0 1 1 1 1 0
CT006 3 Oedema 1 0 0 1 1 1 0
;
run;
proc sql;
create table want as
select *
from clinical
group by study, event
having count(*) gt 1 and (
count(distinct GENDER)=1 or
count(distinct RISK_FACTOR)=1 or
count(distinct cats(FREQ,SD,MEDIAN,MODE,COVARIANCE))=1
);
quit;
Xia Keshan
Your output doesn't match your input. For example, your output shows two records with negative FREQ values but there aren't any in your input. Please fix.
PG
Apology PG.
I made a mistake to copy dataset. I updated the table along with output. Please check once.
The problem is that partition by clause is not supported by SAS. The below SQL approach is taken.
select study,seq,event,gender,risk_factor,freq,SD,median,mode,covariance
from (select study,seq,event,gender,risk_factor,freq,SD,median,mode,covariance,
count(distinct gender) over (partition by event) cnt1,
count(distinct risk_factor) over (partition by event) cnt2,
count(distinct freq||'#'||SD||'#'||median||'#'||mode||'#'||covariance) over (partition by event) cnt3,
count(*) over(partition by event) cnt
from trial_study)
where cnt > 1
and 1 in (cnt1,cnt2,cnt3)
data clinical;
input study $ seq event $ gender risk_factor freq SD median mode covariance;
cards;
CT001 1 Hemorge 1 1 1 1 1 1 1
CT001 2 Hemorge 1 2 0 3 1 0 11
CT001 3 Hemorge 2 2 0 1 1 0 11
CT001 8 Hemoglobin 2 2 0 1 1 0 11
CT001 4 SBP 2 3 1 2 1 0 11
CT001 5 SBP 1 1 1 2 1 0 11
CT001 6 Hemorge 3 3 1 1 1 1 1
CT002 2 Gastro 0 1 1 1 1 1 1
CT002 1 Gastro 1 0 2 1 1 11 1
CT002 3 Gastro 1 0 3 1 1 0 1
CT003 1 Malignt 1 0 1 1 1 0 1
CT003 2 Malignt 0 0 1 1 1 0 1
CT003 3 Malignt 3 1 11 1 1 0 1
CT003 4 Pregnant 3 1 11 1 1 0 1
CT003 5 Pregnant 3 2 0 1 1 0 1
CT004 4 Pregnant 3 1 11 1 1 0 1
CT005 1 Skinlson 1 1 0 0 0 0 0
CT005 2 Skinlson 2 1 0 0 0 0 0
CT005 3 Skinlson 1 0 0 0 0 0 0
CT006 1 Oedema 0 0 1 0 1 1 0
CT006 2 Oedema 1 0 1 1 1 1 0
CT006 4 STD 1 0 1 1 1 1 0
CT006 3 Oedema 1 0 0 1 1 1 0
;
run;
proc sql;
create table want as
select *
from clinical
group by study, event
having count(*) gt 1 and (
range(GENDER)=0 or
range(RISK_FACTOR)=0 or
(range(FREQ)=0 and range(SD)=0 and range(MEDIAN)=0 and range(MODE)=0 and range(COVARIANCE)=0)
);
quit;
Thanks keshan. But the value of variables are not fixed. It changes randomly. I can not enter zero or any fixed value. Is there any alternative approach to find sample results. Advance thanks
Try my code on your sample data. If there were any problem , tell me .
Why aren't CT001 seq 2 and 3 not in the result? They have the same risk_factor.
Look up what the RANGE function does. It is not looking at specific values or your data though it does assume they will be numeric.
Thanks Keshan for providing the logic. It is working fine.
Suppose the database contain value like below (other than zero). In that case, we can not fix the program with zero. We need to use some other approach.
In oracle, I resolve it through below logic using function.
select * from
(select study,seq,event,gender,risk_factor,freq,SD,median,mode,covariance,
count(distinct freq||'#'||SD||'#'||median||'#'||mode||'#'||covariance) over (partition by event) cnt3,
count(*) over(partition by event) cnt
from trial_study)
where cnt > 1
and cnt3=1
|
Dear ballarw,
You are correct. yes, those two record should be captured by the program. I keep only few records to display as output. Sorry for the confusion
SAS Standard SQL also could achieve your Oracle's Logic .
data clinical;
input study $ seq event $ gender risk_factor freq SD median mode covariance;
cards;
CT001 1 Hemorge 1 1 1 1 1 1 1
CT001 2 Hemorge 1 2 0 3 1 0 11
CT001 3 Hemorge 2 2 0 1 1 0 11
CT001 8 Hemoglobin 2 2 0 1 1 0 11
CT001 4 SBP 2 3 1 2 1 0 11
CT001 5 SBP 1 1 1 2 1 0 11
CT001 6 Hemorge 3 3 1 1 1 1 1
CT002 2 Gastro 0 1 1 1 1 1 1
CT002 1 Gastro 1 0 2 1 1 11 1
CT002 3 Gastro 1 0 3 1 1 0 1
CT003 1 Malignt 1 0 1 1 1 0 1
CT003 2 Malignt 0 0 1 1 1 0 1
CT003 3 Malignt 3 1 11 1 1 0 1
CT003 4 Pregnant 3 1 11 1 1 0 1
CT003 5 Pregnant 3 2 0 1 1 0 1
CT004 4 Pregnant 3 1 11 1 1 0 1
CT005 1 Skinlson 1 1 0 0 0 0 0
CT005 2 Skinlson 2 1 0 0 0 0 0
CT005 3 Skinlson 1 0 0 0 0 0 0
CT006 1 Oedema 0 0 1 0 1 1 0
CT006 2 Oedema 1 0 1 1 1 1 0
CT006 4 STD 1 0 1 1 1 1 0
CT006 3 Oedema 1 0 0 1 1 1 0
;
run;
proc sql;
create table want as
select *
from clinical
group by study, event
having count(*) gt 1 and (
count(distinct GENDER)=1 or
count(distinct RISK_FACTOR)=1 or
count(distinct cats(FREQ,SD,MEDIAN,MODE,COVARIANCE))=1
);
quit;
Xia Keshan
Thanks Keshan for your great help
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.