BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Pysiek
Obsidian | Level 7

Hi guys, 

 

I need some help. Below I prepared example dataset with few records of clinical trial data. I need to to write smart code to select correct records. Every record with number in brackets [ ] are called cluster terms, records without [ ] are called preffered terms.  For example Diarhoea [1] will be cluster term while Diarhoea will be preffered term. If both recrods are presented then only cluster term must be displayed. If only preffered term is presented then I want to keep this record in dataset. So for example in below data we can see Diarhoea [1] cluster and Diarhoea preffered term so I want to keep only Diarhoea [1] cluster and remove Diarhoea preffered term from dataset.  

 

 

data test;
input adverse $ 1-12 count1;
cards;
Diarhoea [1] 170
Diarhoea      100
Rash [2]       174 
Rash            145 
Nausea [3]   123 
Nausea         67
Anemia [4]    156 
Cardiac         167 
Fatigue          174 
;
run;

 

Data after processing should look like :

 

Diarhoea [1] 170
Rash [2]       174 
Nausea [3]   123 
Anemia [4]    156 
Cardiac         167 
Fatigue         174 

 

Thank you in advance for help !

1 ACCEPTED SOLUTION

Accepted Solutions
novinosrin
Tourmaline | Level 20

simplified further:

data test;
input adverse  & $20. count1;
cards;
Diarhoea [1]  170
Diarhoea  100
Rash [2]  174 
Rash      145 
Rash macular  100
Rash papular  89
Nausea [3]  123 
Nausea   67
Anemia [4]   156 
Cardiac   167 
Fatigue   174 
;
run;

data grp;
set test;
grp=scan(adverse,1);
run;

data want1;
set grp;
by grp notsorted;
if (first.grp and last.grp)  and countw(adverse)=1 or countw(adverse)>1;
drop grp; run;

View solution in original post

5 REPLIES 5
novinosrin
Tourmaline | Level 20

Is your sample an exact representative of your real?

data test;
input adverse $ 1-12 count1;
cards;
Diarhoea [1] 170
Diarhoea      100
Rash [2]       174 
Rash            145 
Nausea [3]   123 
Nausea         67
Anemia [4]    156 
Cardiac         167 
Fatigue          174 
;
run;

data grp;
set test;
grp=scan(adverse,1);
run;
data want;
set grp;
by grp notsorted;
if (first.grp and last.grp) or (first.grp and countw(adverse)>1);
drop grp;
run;
Pysiek
Obsidian | Level 7

Hi Novinosrin,

 

Your code is working fine, but sometimes there are cases when I have few preffered terms starting with same word like :

Rash macular

Rash maculo-papular

Rash maculovesicular

Rash morbilliform

Rash neonatal

Rash papular

Rash rubelliform

Rash scarlatiniform

Rash vesicular

 

Updated data :

 

data test;
input adverse $ 1-12 count1;
cards;
Diarhoea [1] 170
Diarhoea 100
Rash [2] 174
Rash 145
Rash macular 100

Rash papular 89
Nausea [3] 123
Nausea 67
Anemia [4] 156
Cardiac 167
Fatigue 174
;
run;

 

So in this scenario "Rash macular" and "Rash papular" should stay in dataset. Only "Rash" should be removed. 

 

novinosrin
Tourmaline | Level 20
data test;
input adverse  & $20. count1;
cards;
Diarhoea [1]  170
Diarhoea  100
Rash [2]  174 
Rash      145 
Rash macular  100
Rash papular  89
Nausea [3]  123 
Nausea   67
Anemia [4]   156 
Cardiac   167 
Fatigue   174 
;
run;

data grp;
set test;
grp=scan(adverse,1);
run;
data want;
set grp;
by grp notsorted;
if (first.grp and last.grp) or (first.grp and countw(adverse)>1) or (not first.grp and countw(adverse)>1);
drop grp;
run;
novinosrin
Tourmaline | Level 20

simplified further:

data test;
input adverse  & $20. count1;
cards;
Diarhoea [1]  170
Diarhoea  100
Rash [2]  174 
Rash      145 
Rash macular  100
Rash papular  89
Nausea [3]  123 
Nausea   67
Anemia [4]   156 
Cardiac   167 
Fatigue   174 
;
run;

data grp;
set test;
grp=scan(adverse,1);
run;

data want1;
set grp;
by grp notsorted;
if (first.grp and last.grp)  and countw(adverse)=1 or countw(adverse)>1;
drop grp; run;
Jagadishkatam
Amethyst | Level 16

Another approach,

 

data test;
input adverse $ 1-12 count1;
cards;
Diarhoea [1] 170
Diarhoea      100
Rash [2]       174 
Rash            145 
Nausea [3]   123 
Nausea         67
Anemia [4]    156 
Cardiac         167 
Fatigue          174 
;
run;

data want; set test; by adverse notsorted; adverse1=scan(adverse,1,'[ '); run; data want2; length adverse2 $200; set want; retain adverse2; by adverse1 notsorted; if first.adverse1 then adverse2=adverse; run; data want3(rename=(adverse2=adverse)); set want2; by adverse2 notsorted; if last.adverse2; drop adverse1 adverse; run;
Thanks,
Jag

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 1243 views
  • 1 like
  • 3 in conversation