I have a dataset in which each row is a long block of text, and I need to extract specific information from each row.
Below is an example:
id Text
1 Physical Exam. Vital Signs. BP: 134/93
2 Patient's physical exam was notable for BP of 142/100
3 Physical Exam. BP: 100/80
Below is what I need:
id Text
1 BP: 134/93
2 BP of 142/100
3 BP: 100/80
Any advice?
Do something like this
data have;
input id Text :$200.;
infile datalines4 dlm=',';
datalines;
1,Physical Exam. Vital Signs. BP: 134/93
2,Patient's physical exam was notable for BP of 142/100
3,Physical Exam. BP: 100/80
;
data want;
set have;
NewString=substr(Text, index(Text, 'BP'));
run;
data cardiac;
input id Text :$200.;
infile datalines4 dlm=',';
datalines;
1,Physical Exam. Vital Signs. BP: 134/93
2,Patient's physical exam was notable for BP of 142/100
3,Physical Exam. BP: 100/80
;
data strepto;
set cardiac;
NewString=substr(Text, index(Text, 'BP'));
run;
Alternatively with prxchange function
data have;
input id Text&$100.;
text=prxchange('s/(.*)(bp.*)/$2/i',-1,text);
datalines4;
1 Physical Exam. Vital Signs. BP: 134/93
2 Patient's physical exam was notable for BP of 142/100
3 Physical Exam. BP: 100/80
;;;;
1 | 1 | Physical Exam. Vital Signs. BP: 134/93 | BP: 134/93 |
2 | 2 | Patient's physical exam was notable for BP of 142/100 | BP of 142/100 |
3 | 3 | Physical Exam. BP: 100/80 | BP: 100/80 |
data have;
input id Text :$200.;
infile datalines4 dlm=',';
datalines;
1,Physical Exam. Vital Signs. BP: 134/93
2,Patient's physical exam was notable for BP of 142/100
3,Physical Exam. BP: 100/80
;
data want;
set have;
p=prxmatch('/\bBP\b/i',text);
if p then NewString=substr(Text, p);
run;
proc print;run;
data sle;
set autoimmun;
p=prxmatch('/\bBP\b/i',text);
if p then NewString=substr(Text, p);
run;
Thanks Ksharp, another way to estract ///
@SarahW13 wrote:
I have a dataset in which each row is a long block of text, and I need to extract specific information from each row.
Below is an example:
id Text
1 Physical Exam. Vital Signs. BP: 134/93
2 Patient's physical exam was notable for BP of 142/100
3 Physical Exam. BP: 100/80
Below is what I need:
id Text
1 BP: 134/93
2 BP of 142/100
3 BP: 100/80
Any advice?
is BP always the last item entered? Is it always recorded with the / dividing the measurements?
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.