Solved: Re: How to select correct records from dataset?

Pysiek · Posted 03-18-2018 04:27 PM

Hi guys,

I need some help. Below I prepared example dataset with few records of clinical trial data. I need to to write smart code to select correct records. Every record with number in brackets [ ] are called cluster terms, records without [ ] are called preffered terms. For example Diarhoea [1] will be cluster term while Diarhoea will be preffered term. If both recrods are presented then only cluster term must be displayed. If only preffered term is presented then I want to keep this record in dataset. So for example in below data we can see Diarhoea [1] cluster and Diarhoea preffered term so I want to keep only Diarhoea [1] cluster and remove Diarhoea preffered term from dataset.

data test;
input adverse $ 1-12 count1;
cards;
Diarhoea [1] 170
Diarhoea 100
Rash [2] 174
Rash 145
Nausea [3] 123
Nausea 67
Anemia [4] 156
Cardiac 167
Fatigue 174
;
run;

Data after processing should look like :

Diarhoea [1] 170
Rash [2] 174
Nausea [3] 123
Anemia [4] 156
Cardiac 167
Fatigue 174

Thank you in advance for help !

novinosrin · Posted 03-18-2018 06:40 PM

simplified further:

data test;
input adverse  & $20. count1;
cards;
Diarhoea [1]  170
Diarhoea  100
Rash [2]  174 
Rash      145 
Rash macular  100
Rash papular  89
Nausea [3]  123 
Nausea   67
Anemia [4]   156 
Cardiac   167 
Fatigue   174 
;
run;

data grp;
set test;
grp=scan(adverse,1);
run;

data want1;
set grp;
by grp notsorted;
if (first.grp and last.grp)  and countw(adverse)=1 or countw(adverse)>1;
drop grp;
run;

View solution in original post

novinosrin · Posted 03-18-2018 05:25 PM

Is your sample an exact representative of your real?

data test;
input adverse $ 1-12 count1;
cards;
Diarhoea [1] 170
Diarhoea      100
Rash [2]       174 
Rash            145 
Nausea [3]   123 
Nausea         67
Anemia [4]    156 
Cardiac         167 
Fatigue          174 
;
run;

data grp;
set test;
grp=scan(adverse,1);
run;
data want;
set grp;
by grp notsorted;
if (first.grp and last.grp) or (first.grp and countw(adverse)>1);
drop grp;
run;

Pysiek · Posted 03-18-2018 05:43 PM

Hi Novinosrin,

Your code is working fine, but sometimes there are cases when I have few preffered terms starting with same word like :

Updated data :

data test;
input adverse $ 1-12 count1;
cards;
Diarhoea [1] 170
Diarhoea 100
Rash [2] 174
Rash 145
Rash macular 100

Rash papular 89
Nausea [3] 123
Nausea 67
Anemia [4] 156
Cardiac 167
Fatigue 174
;
run;

So in this scenario "Rash macular" and "Rash papular" should stay in dataset. Only "Rash" should be removed.

novinosrin · Posted 03-18-2018 06:30 PM

data test;
input adverse  & $20. count1;
cards;
Diarhoea [1]  170
Diarhoea  100
Rash [2]  174 
Rash      145 
Rash macular  100
Rash papular  89
Nausea [3]  123 
Nausea   67
Anemia [4]   156 
Cardiac   167 
Fatigue   174 
;
run;

data grp;
set test;
grp=scan(adverse,1);
run;
data want;
set grp;
by grp notsorted;
if (first.grp and last.grp) or (first.grp and countw(adverse)>1) or (not first.grp and countw(adverse)>1);
drop grp;
run;

novinosrin · Posted 03-18-2018 06:40 PM

simplified further:

data test;
input adverse  & $20. count1;
cards;
Diarhoea [1]  170
Diarhoea  100
Rash [2]  174 
Rash      145 
Rash macular  100
Rash papular  89
Nausea [3]  123 
Nausea   67
Anemia [4]   156 
Cardiac   167 
Fatigue   174 
;
run;

data grp;
set test;
grp=scan(adverse,1);
run;

data want1;
set grp;
by grp notsorted;
if (first.grp and last.grp)  and countw(adverse)=1 or countw(adverse)>1;
drop grp;
run;

Jagadishkatam · Posted 03-18-2018 09:05 PM

Another approach,

data test;
input adverse $ 1-12 count1;
cards;
Diarhoea [1] 170
Diarhoea      100
Rash [2]       174 
Rash            145 
Nausea [3]   123 
Nausea         67
Anemia [4]    156 
Cardiac         167 
Fatigue          174 
;
run;

data want;
set test;
by adverse notsorted;
adverse1=scan(adverse,1,'[ ');
run;

data want2;
length adverse2 $200;
set want;
retain adverse2;
by adverse1 notsorted;
if first.adverse1 then adverse2=adverse;
run;

data want3(rename=(adverse2=adverse));
set want2;
by adverse2 notsorted;
if last.adverse2;
drop adverse1 adverse;
run;

Thanks,
Jag

Classroom Training Available!