Hi guys,
I need some help. Below I prepared example dataset with few records of clinical trial data. I need to to write smart code to select correct records. Every record with number in brackets [ ] are called cluster terms, records without [ ] are called preffered terms. For example Diarhoea [1] will be cluster term while Diarhoea will be preffered term. If both recrods are presented then only cluster term must be displayed. If only preffered term is presented then I want to keep this record in dataset. So for example in below data we can see Diarhoea [1] cluster and Diarhoea preffered term so I want to keep only Diarhoea [1] cluster and remove Diarhoea preffered term from dataset.
data test;
input adverse $ 1-12 count1;
cards;
Diarhoea [1] 170
Diarhoea 100
Rash [2] 174
Rash 145
Nausea [3] 123
Nausea 67
Anemia [4] 156
Cardiac 167
Fatigue 174
;
run;
Data after processing should look like :
Diarhoea [1] 170
Rash [2] 174
Nausea [3] 123
Anemia [4] 156
Cardiac 167
Fatigue 174
Thank you in advance for help !
simplified further:
data test;
input adverse & $20. count1;
cards;
Diarhoea [1] 170
Diarhoea 100
Rash [2] 174
Rash 145
Rash macular 100
Rash papular 89
Nausea [3] 123
Nausea 67
Anemia [4] 156
Cardiac 167
Fatigue 174
;
run;
data grp;
set test;
grp=scan(adverse,1);
run;
data want1;
set grp;
by grp notsorted;
if (first.grp and last.grp) and countw(adverse)=1 or countw(adverse)>1;
drop grp;
run;
Is your sample an exact representative of your real?
data test;
input adverse $ 1-12 count1;
cards;
Diarhoea [1] 170
Diarhoea 100
Rash [2] 174
Rash 145
Nausea [3] 123
Nausea 67
Anemia [4] 156
Cardiac 167
Fatigue 174
;
run;
data grp;
set test;
grp=scan(adverse,1);
run;
data want;
set grp;
by grp notsorted;
if (first.grp and last.grp) or (first.grp and countw(adverse)>1);
drop grp;
run;
Hi Novinosrin,
Your code is working fine, but sometimes there are cases when I have few preffered terms starting with same word like :
Updated data :
data test;
input adverse $ 1-12 count1;
cards;
Diarhoea [1] 170
Diarhoea 100
Rash [2] 174
Rash 145
Rash macular 100
Rash papular 89
Nausea [3] 123
Nausea 67
Anemia [4] 156
Cardiac 167
Fatigue 174
;
run;
So in this scenario "Rash macular" and "Rash papular" should stay in dataset. Only "Rash" should be removed.
data test;
input adverse & $20. count1;
cards;
Diarhoea [1] 170
Diarhoea 100
Rash [2] 174
Rash 145
Rash macular 100
Rash papular 89
Nausea [3] 123
Nausea 67
Anemia [4] 156
Cardiac 167
Fatigue 174
;
run;
data grp;
set test;
grp=scan(adverse,1);
run;
data want;
set grp;
by grp notsorted;
if (first.grp and last.grp) or (first.grp and countw(adverse)>1) or (not first.grp and countw(adverse)>1);
drop grp;
run;
simplified further:
data test;
input adverse & $20. count1;
cards;
Diarhoea [1] 170
Diarhoea 100
Rash [2] 174
Rash 145
Rash macular 100
Rash papular 89
Nausea [3] 123
Nausea 67
Anemia [4] 156
Cardiac 167
Fatigue 174
;
run;
data grp;
set test;
grp=scan(adverse,1);
run;
data want1;
set grp;
by grp notsorted;
if (first.grp and last.grp) and countw(adverse)=1 or countw(adverse)>1;
drop grp;
run;
Another approach,
data test;
input adverse $ 1-12 count1;
cards;
Diarhoea [1] 170
Diarhoea 100
Rash [2] 174
Rash 145
Nausea [3] 123
Nausea 67
Anemia [4] 156
Cardiac 167
Fatigue 174
;
run;
data want;
set test;
by adverse notsorted;
adverse1=scan(adverse,1,'[ ');
run;
data want2;
length adverse2 $200;
set want;
retain adverse2;
by adverse1 notsorted;
if first.adverse1 then adverse2=adverse;
run;
data want3(rename=(adverse2=adverse));
set want2;
by adverse2 notsorted;
if last.adverse2;
drop adverse1 adverse;
run;
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.