BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Pysiek
Obsidian | Level 7

Hi guys, 

 

I need some help. Below I prepared example dataset with few records of clinical trial data. I need to to write smart code to select correct records. Every record with number in brackets [ ] are called cluster terms, records without [ ] are called preffered terms.  For example Diarhoea [1] will be cluster term while Diarhoea will be preffered term. If both recrods are presented then only cluster term must be displayed. If only preffered term is presented then I want to keep this record in dataset. So for example in below data we can see Diarhoea [1] cluster and Diarhoea preffered term so I want to keep only Diarhoea [1] cluster and remove Diarhoea preffered term from dataset.  

 

 

data test;
input adverse $ 1-12 count1;
cards;
Diarhoea [1] 170
Diarhoea      100
Rash [2]       174 
Rash            145 
Nausea [3]   123 
Nausea         67
Anemia [4]    156 
Cardiac         167 
Fatigue          174 
;
run;

 

Data after processing should look like :

 

Diarhoea [1] 170
Rash [2]       174 
Nausea [3]   123 
Anemia [4]    156 
Cardiac         167 
Fatigue         174 

 

Thank you in advance for help !

1 ACCEPTED SOLUTION

Accepted Solutions
novinosrin
Tourmaline | Level 20

simplified further:

data test;
input adverse  & $20. count1;
cards;
Diarhoea [1]  170
Diarhoea  100
Rash [2]  174 
Rash      145 
Rash macular  100
Rash papular  89
Nausea [3]  123 
Nausea   67
Anemia [4]   156 
Cardiac   167 
Fatigue   174 
;
run;

data grp;
set test;
grp=scan(adverse,1);
run;

data want1;
set grp;
by grp notsorted;
if (first.grp and last.grp)  and countw(adverse)=1 or countw(adverse)>1;
drop grp; run;

View solution in original post

5 REPLIES 5
novinosrin
Tourmaline | Level 20

Is your sample an exact representative of your real?

data test;
input adverse $ 1-12 count1;
cards;
Diarhoea [1] 170
Diarhoea      100
Rash [2]       174 
Rash            145 
Nausea [3]   123 
Nausea         67
Anemia [4]    156 
Cardiac         167 
Fatigue          174 
;
run;

data grp;
set test;
grp=scan(adverse,1);
run;
data want;
set grp;
by grp notsorted;
if (first.grp and last.grp) or (first.grp and countw(adverse)>1);
drop grp;
run;
Pysiek
Obsidian | Level 7

Hi Novinosrin,

 

Your code is working fine, but sometimes there are cases when I have few preffered terms starting with same word like :

Rash macular

Rash maculo-papular

Rash maculovesicular

Rash morbilliform

Rash neonatal

Rash papular

Rash rubelliform

Rash scarlatiniform

Rash vesicular

 

Updated data :

 

data test;
input adverse $ 1-12 count1;
cards;
Diarhoea [1] 170
Diarhoea 100
Rash [2] 174
Rash 145
Rash macular 100

Rash papular 89
Nausea [3] 123
Nausea 67
Anemia [4] 156
Cardiac 167
Fatigue 174
;
run;

 

So in this scenario "Rash macular" and "Rash papular" should stay in dataset. Only "Rash" should be removed. 

 

novinosrin
Tourmaline | Level 20
data test;
input adverse  & $20. count1;
cards;
Diarhoea [1]  170
Diarhoea  100
Rash [2]  174 
Rash      145 
Rash macular  100
Rash papular  89
Nausea [3]  123 
Nausea   67
Anemia [4]   156 
Cardiac   167 
Fatigue   174 
;
run;

data grp;
set test;
grp=scan(adverse,1);
run;
data want;
set grp;
by grp notsorted;
if (first.grp and last.grp) or (first.grp and countw(adverse)>1) or (not first.grp and countw(adverse)>1);
drop grp;
run;
novinosrin
Tourmaline | Level 20

simplified further:

data test;
input adverse  & $20. count1;
cards;
Diarhoea [1]  170
Diarhoea  100
Rash [2]  174 
Rash      145 
Rash macular  100
Rash papular  89
Nausea [3]  123 
Nausea   67
Anemia [4]   156 
Cardiac   167 
Fatigue   174 
;
run;

data grp;
set test;
grp=scan(adverse,1);
run;

data want1;
set grp;
by grp notsorted;
if (first.grp and last.grp)  and countw(adverse)=1 or countw(adverse)>1;
drop grp; run;
Jagadishkatam
Amethyst | Level 16

Another approach,

 

data test;
input adverse $ 1-12 count1;
cards;
Diarhoea [1] 170
Diarhoea      100
Rash [2]       174 
Rash            145 
Nausea [3]   123 
Nausea         67
Anemia [4]    156 
Cardiac         167 
Fatigue          174 
;
run;

data want; set test; by adverse notsorted; adverse1=scan(adverse,1,'[ '); run; data want2; length adverse2 $200; set want; retain adverse2; by adverse1 notsorted; if first.adverse1 then adverse2=adverse; run; data want3(rename=(adverse2=adverse)); set want2; by adverse2 notsorted; if last.adverse2; drop adverse1 adverse; run;
Thanks,
Jag

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 1174 views
  • 1 like
  • 3 in conversation