DATA Step, Macro, Functions and more

How to select correct records from dataset?

Accepted Solution Solved
Reply
New Contributor
Posts: 4
Accepted Solution

How to select correct records from dataset?

[ Edited ]

Hi guys, 

 

I need some help. Below I prepared example dataset with few records of clinical trial data. I need to to write smart code to select correct records. Every record with number in brackets [ ] are called cluster terms, records without [ ] are called preffered terms.  For example Diarhoea [1] will be cluster term while Diarhoea will be preffered term. If both recrods are presented then only cluster term must be displayed. If only preffered term is presented then I want to keep this record in dataset. So for example in below data we can see Diarhoea [1] cluster and Diarhoea preffered term so I want to keep only Diarhoea [1] cluster and remove Diarhoea preffered term from dataset.  

 

 

data test;
input adverse $ 1-12 count1;
cards;
Diarhoea [1] 170
Diarhoea      100
Rash [2]       174 
Rash            145 
Nausea [3]   123 
Nausea         67
Anemia [4]    156 
Cardiac         167 
Fatigue          174 
;
run;

 

Data after processing should look like :

 

Diarhoea [1] 170
Rash [2]       174 
Nausea [3]   123 
Anemia [4]    156 
Cardiac         167 
Fatigue         174 

 

Thank you in advance for help !


Accepted Solutions
Solution
‎03-19-2018 03:53 PM
PROC Star
Posts: 1,357

Re: How to select correct records from dataset?

[ Edited ]
Posted in reply to novinosrin

simplified further:

data test;
input adverse  & $20. count1;
cards;
Diarhoea [1]  170
Diarhoea  100
Rash [2]  174 
Rash      145 
Rash macular  100
Rash papular  89
Nausea [3]  123 
Nausea   67
Anemia [4]   156 
Cardiac   167 
Fatigue   174 
;
run;

data grp;
set test;
grp=scan(adverse,1);
run;

data want1;
set grp;
by grp notsorted;
if (first.grp and last.grp)  and countw(adverse)=1 or countw(adverse)>1;
drop grp; run;

View solution in original post


All Replies
PROC Star
Posts: 1,357

Re: How to select correct records from dataset?

Is your sample an exact representative of your real?

data test;
input adverse $ 1-12 count1;
cards;
Diarhoea [1] 170
Diarhoea      100
Rash [2]       174 
Rash            145 
Nausea [3]   123 
Nausea         67
Anemia [4]    156 
Cardiac         167 
Fatigue          174 
;
run;

data grp;
set test;
grp=scan(adverse,1);
run;
data want;
set grp;
by grp notsorted;
if (first.grp and last.grp) or (first.grp and countw(adverse)>1);
drop grp;
run;
New Contributor
Posts: 4

Re: How to select correct records from dataset?

Posted in reply to novinosrin

Hi Novinosrin,

 

Your code is working fine, but sometimes there are cases when I have few preffered terms starting with same word like :

Rash macular

Rash maculo-papular

Rash maculovesicular

Rash morbilliform

Rash neonatal

Rash papular

Rash rubelliform

Rash scarlatiniform

Rash vesicular

 

Updated data :

 

data test;
input adverse $ 1-12 count1;
cards;
Diarhoea [1] 170
Diarhoea 100
Rash [2] 174
Rash 145
Rash macular 100

Rash papular 89
Nausea [3] 123
Nausea 67
Anemia [4] 156
Cardiac 167
Fatigue 174
;
run;

 

So in this scenario "Rash macular" and "Rash papular" should stay in dataset. Only "Rash" should be removed. 

 

PROC Star
Posts: 1,357

Re: How to select correct records from dataset?

data test;
input adverse  & $20. count1;
cards;
Diarhoea [1]  170
Diarhoea  100
Rash [2]  174 
Rash      145 
Rash macular  100
Rash papular  89
Nausea [3]  123 
Nausea   67
Anemia [4]   156 
Cardiac   167 
Fatigue   174 
;
run;

data grp;
set test;
grp=scan(adverse,1);
run;
data want;
set grp;
by grp notsorted;
if (first.grp and last.grp) or (first.grp and countw(adverse)>1) or (not first.grp and countw(adverse)>1);
drop grp;
run;
Solution
‎03-19-2018 03:53 PM
PROC Star
Posts: 1,357

Re: How to select correct records from dataset?

[ Edited ]
Posted in reply to novinosrin

simplified further:

data test;
input adverse  & $20. count1;
cards;
Diarhoea [1]  170
Diarhoea  100
Rash [2]  174 
Rash      145 
Rash macular  100
Rash papular  89
Nausea [3]  123 
Nausea   67
Anemia [4]   156 
Cardiac   167 
Fatigue   174 
;
run;

data grp;
set test;
grp=scan(adverse,1);
run;

data want1;
set grp;
by grp notsorted;
if (first.grp and last.grp)  and countw(adverse)=1 or countw(adverse)>1;
drop grp; run;
Trusted Advisor
Posts: 1,146

Re: How to select correct records from dataset?

Another approach,

 

data test;
input adverse $ 1-12 count1;
cards;
Diarhoea [1] 170
Diarhoea      100
Rash [2]       174 
Rash            145 
Nausea [3]   123 
Nausea         67
Anemia [4]    156 
Cardiac         167 
Fatigue          174 
;
run;

data want; set test; by adverse notsorted; adverse1=scan(adverse,1,'[ '); run; data want2; length adverse2 $200; set want; retain adverse2; by adverse1 notsorted; if first.adverse1 then adverse2=adverse; run; data want3(rename=(adverse2=adverse)); set want2; by adverse2 notsorted; if last.adverse2; drop adverse1 adverse; run;
Thanks,
Jag
☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 5 replies
  • 157 views
  • 1 like
  • 3 in conversation