I have a dataset in which each row is a long block of text, and I need to extract specific information from each row.
Below is an example:
id Text
1 Physical Exam. Vital Signs. BP: 134/93
2 Patient's physical exam was notable for BP of 142/100
3 Physical Exam. BP: 100/80
Below is what I need:
id Text
1 BP: 134/93
2 BP of 142/100
3 BP: 100/80
Any advice?
Do something like this
data have;
input id Text :$200.;
infile datalines4 dlm=',';
datalines;
1,Physical Exam. Vital Signs. BP: 134/93
2,Patient's physical exam was notable for BP of 142/100
3,Physical Exam. BP: 100/80
;
data want;
set have;
NewString=substr(Text, index(Text, 'BP'));
run;
data cardiac;
input id Text :$200.;
infile datalines4 dlm=',';
datalines;
1,Physical Exam. Vital Signs. BP: 134/93
2,Patient's physical exam was notable for BP of 142/100
3,Physical Exam. BP: 100/80
;
data strepto;
set cardiac;
NewString=substr(Text, index(Text, 'BP'));
run;
Alternatively with prxchange function
data have;
input id Text&$100.;
text=prxchange('s/(.*)(bp.*)/$2/i',-1,text);
datalines4;
1 Physical Exam. Vital Signs. BP: 134/93
2 Patient's physical exam was notable for BP of 142/100
3 Physical Exam. BP: 100/80
;;;;
1 | 1 | Physical Exam. Vital Signs. BP: 134/93 | BP: 134/93 |
2 | 2 | Patient's physical exam was notable for BP of 142/100 | BP of 142/100 |
3 | 3 | Physical Exam. BP: 100/80 | BP: 100/80 |
data have;
input id Text :$200.;
infile datalines4 dlm=',';
datalines;
1,Physical Exam. Vital Signs. BP: 134/93
2,Patient's physical exam was notable for BP of 142/100
3,Physical Exam. BP: 100/80
;
data want;
set have;
p=prxmatch('/\bBP\b/i',text);
if p then NewString=substr(Text, p);
run;
proc print;run;
data sle;
set autoimmun;
p=prxmatch('/\bBP\b/i',text);
if p then NewString=substr(Text, p);
run;
Thanks Ksharp, another way to estract ///
@SarahW13 wrote:
I have a dataset in which each row is a long block of text, and I need to extract specific information from each row.
Below is an example:
id Text
1 Physical Exam. Vital Signs. BP: 134/93
2 Patient's physical exam was notable for BP of 142/100
3 Physical Exam. BP: 100/80
Below is what I need:
id Text
1 BP: 134/93
2 BP of 142/100
3 BP: 100/80
Any advice?
is BP always the last item entered? Is it always recorded with the / dividing the measurements?
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.