Hi guys,
I need some help. Below I prepared example dataset with few records of clinical trial data. I need to to write smart code to select correct records. Every record with number in brackets [ ] are called cluster terms, records without [ ] are called preffered terms. For example Diarhoea [1] will be cluster term while Diarhoea will be preffered term. If both recrods are presented then only cluster term must be displayed. If only preffered term is presented then I want to keep this record in dataset. So for example in below data we can see Diarhoea [1] cluster and Diarhoea preffered term so I want to keep only Diarhoea [1] cluster and remove Diarhoea preffered term from dataset.
data test;
input adverse $ 1-12 count1;
cards;
Diarhoea [1] 170
Diarhoea 100
Rash [2] 174
Rash 145
Nausea [3] 123
Nausea 67
Anemia [4] 156
Cardiac 167
Fatigue 174
;
run;
Data after processing should look like :
Diarhoea [1] 170
Rash [2] 174
Nausea [3] 123
Anemia [4] 156
Cardiac 167
Fatigue 174
Thank you in advance for help !
simplified further:
data test;
input adverse & $20. count1;
cards;
Diarhoea [1] 170
Diarhoea 100
Rash [2] 174
Rash 145
Rash macular 100
Rash papular 89
Nausea [3] 123
Nausea 67
Anemia [4] 156
Cardiac 167
Fatigue 174
;
run;
data grp;
set test;
grp=scan(adverse,1);
run;
data want1;
set grp;
by grp notsorted;
if (first.grp and last.grp) and countw(adverse)=1 or countw(adverse)>1;
drop grp;
run;
Is your sample an exact representative of your real?
data test;
input adverse $ 1-12 count1;
cards;
Diarhoea [1] 170
Diarhoea 100
Rash [2] 174
Rash 145
Nausea [3] 123
Nausea 67
Anemia [4] 156
Cardiac 167
Fatigue 174
;
run;
data grp;
set test;
grp=scan(adverse,1);
run;
data want;
set grp;
by grp notsorted;
if (first.grp and last.grp) or (first.grp and countw(adverse)>1);
drop grp;
run;
Hi Novinosrin,
Your code is working fine, but sometimes there are cases when I have few preffered terms starting with same word like :
Updated data :
data test;
input adverse $ 1-12 count1;
cards;
Diarhoea [1] 170
Diarhoea 100
Rash [2] 174
Rash 145
Rash macular 100
Rash papular 89
Nausea [3] 123
Nausea 67
Anemia [4] 156
Cardiac 167
Fatigue 174
;
run;
So in this scenario "Rash macular" and "Rash papular" should stay in dataset. Only "Rash" should be removed.
data test;
input adverse & $20. count1;
cards;
Diarhoea [1] 170
Diarhoea 100
Rash [2] 174
Rash 145
Rash macular 100
Rash papular 89
Nausea [3] 123
Nausea 67
Anemia [4] 156
Cardiac 167
Fatigue 174
;
run;
data grp;
set test;
grp=scan(adverse,1);
run;
data want;
set grp;
by grp notsorted;
if (first.grp and last.grp) or (first.grp and countw(adverse)>1) or (not first.grp and countw(adverse)>1);
drop grp;
run;
simplified further:
data test;
input adverse & $20. count1;
cards;
Diarhoea [1] 170
Diarhoea 100
Rash [2] 174
Rash 145
Rash macular 100
Rash papular 89
Nausea [3] 123
Nausea 67
Anemia [4] 156
Cardiac 167
Fatigue 174
;
run;
data grp;
set test;
grp=scan(adverse,1);
run;
data want1;
set grp;
by grp notsorted;
if (first.grp and last.grp) and countw(adverse)=1 or countw(adverse)>1;
drop grp;
run;
Another approach,
data test;
input adverse $ 1-12 count1;
cards;
Diarhoea [1] 170
Diarhoea 100
Rash [2] 174
Rash 145
Nausea [3] 123
Nausea 67
Anemia [4] 156
Cardiac 167
Fatigue 174
;
run;
data want;
set test;
by adverse notsorted;
adverse1=scan(adverse,1,'[ ');
run;
data want2;
length adverse2 $200;
set want;
retain adverse2;
by adverse1 notsorted;
if first.adverse1 then adverse2=adverse;
run;
data want3(rename=(adverse2=adverse));
set want2;
by adverse2 notsorted;
if last.adverse2;
drop adverse1 adverse;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.