BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Ariel_3
Fluorite | Level 6

I am trying to separate Lab Information that is in one variable into separate variables and separate rows for each instance. Below is an example of what the lab information looks like for one row. This example has 3 labs lumped together, but I have examples with up to 9 labs lumped together. I would like to have the <b>, </b> etc. removed, and the individual column names renamed so they don’t have any spaces. *Also, this example is test data.

 

Example of Variable is in the attachment.

 

Desired Result for each row (some fields are NULL and others have results)

 
 LocalID:DateRcdByPH:SpecCollDt:ELR_IND:ResultedTest:CodedResult:NumericResult:TextResults:Comments:
Row 1OBS456571123N011/15/20201/6/2020YRPR Qualitative Stat  Reactive * 
Row 2OBS45657123IN011/15/20201/7/2020YRPR Quantitative Stat  0.0861111 
Row 3OB456874123N013/13/20201/5/2020YFTA  Reactive * 

 

I've tried playing around with PRXmatch, but I'm not very familiar with this function to get the desired result I need. I’m also unsure how to rename multiple values within the same variable, so the new column names don’t have any spaces (i.e., changing ‘Local ID’ to ‘LocalID’. Any help is greatly appreciated! Below is the code I currently have. 

 

data lab2;
set lab;

/* Extract the required information from the LABORATORY_INFORMATION column */
LocalID=prxmatch("/(?<=LocalID:)(.*?)(?=\r\n)/", LABORATORY_INFORMATION);
DateRcvdByPH = prxmatch("/(?<=DateRcvdByPH:)(.*?)(?=\r\n)/", LABORATORY_INFORMATION);
SpecCollDt = prxmatch("/(?<=SpecCollDt:)(.*?)(?=\r\n)/", LABORATORY_INFORMATION);
ELR_IND = prxmatch("/(?<=ELR_IND:)(.*?)(?=\r\n)/", LABORATORY_INFORMATION);
ResultedTest = prxmatch("/(?<=ResultedTest:)(.*?)(?=\r\n)/", LABORATORY_INFORMATION);
CodedResult = prxmatch("/(?<=CodedResult:)(.*?)(?=\r\n)/", LABORATORY_INFORMATION);
NumericResult = prxmatch("/(?<=NumericResult:)(.*?)(?=\r\n)/", LABORATORY_INFORMATION);
TextResult = prxmatch("/(?<=TextResult:)(.*?)(?=\r\n)/", LABORATORY_INFORMATION);
Comments = prxmatch("/(?<=Comments:)(.*?)$/", LABORATORY_INFORMATION);

/* Trim and squeeze the extracted values */
LocalID = trim(compbl(LocalID));
DateRcvdByPH = trim(compbl(DateRcvdByPH));
SpecCollDt = trim(compbl(SpecCollDt));
ELR_IND = trim(compbl(ELR_IND));
ResultedTest = trim(compbl(ResultedTest));
CodedResult = trim(compbl(CodedResult));
NumericResult = trim(compbl(NumericResult));
TextResult = trim(compbl(TextResult));
Comments = trim(compbl(Comments));

/* Drop the original LABORATORY_INFORMATION column */
drop LABORATORY_INFORMATION;
run;

1 ACCEPTED SOLUTION

Accepted Solutions
Mazi
Quartz | Level 8
data lab;
infile datalines truncover ;
input LABORATORY_INFORMATION $32767.;
datalines;
<b>Local ID:</b> OBS456571123N01<b><b>Date Received by PH:</b> 01/15/2020<b><b>Specimen Collection Date:</b> 01/06/2020<b><b>ELR Indicator:</b>Y<b><b>Resulted Test:</b> RPR Qualitative Stat<b><b>Coded Result:</b> <b><b>Numeric Result:</b> <b><b>Text Result:</b> Reactive *<b><b>Comments:</b> <b><b><b>Local ID:</b> OBS45657123IN01<b><b>Date Received by PH:</b> 01/15/2020<b><b>Specimen Collection Date:</b> 01/07/2020<b><b>ELR Indicator:</b>Y<b><b>Resulted Test:</b> RPR Quantitative Stat<b><b>Coded Result:</b> <b><b>Numeric Result:</b> <b><b>Text Result:</b> 1:64<b><b>Comments:</b> <b><b><b>Local ID:</b> OB456874123N01<b><b>Date Received by PH:</b> 03/13/2020<b><b>Specimen Collection Date:</b> 01/05/2020<b><b>ELR Indicator:</b>Y<b><b>Resulted Test:</b> FTA<b><b>Coded Result:</b> <b><b>Numeric Result:</b> <b><b>Text Result:</b> Reactive *<b><b>Comments:</b> <b><b>
;
run;

data lab2;
	set lab;
	array columns [9] $200 localid datercdbyph speccolldt elr_ind resultedtest codedresult numericresult textresults comments;
	loop:
	start=1;
	do index=1 to 9;
		temp1 = find(laboratory_information, "</b>", start);
		temp2 = find(laboratory_information, "<b>", temp1);
		columns[index] = substr(laboratory_information, temp1 + 4, temp2 - temp1 - 4);
		start=temp2;
	end;
	output;
	call missing(of columns[*]);
	if length(laboratory_information) > (start + 6) then do;
		laboratory_information = substr(laboratory_information, start + 6);
		goto loop;
	end;
run;

Appologies, I didnt realise it was one row. How about this?

View solution in original post

7 REPLIES 7
Mazi
Quartz | Level 8

Hi, if you set options validvarname=v7; SAS will automatically replace spaces in variable names with underscores.

Additionally, its best to create sample data using a datalines statement showing the data you have, and another dataset, showing what you expect to get help quickly. 
Most users are reluctant to download excel files due to viruses and the like.

Ariel_3
Fluorite | Level 6

Thanks for the suggestion, and the tip on using datalines instead of attachments! Please see below.  

 

data lab;

input LABORATORY_INFORMATION $;

datalines;

<b>Local ID:</b> OBS456571123N01<b><b>Date Received by PH:</b> 01/15/2020<b><b>Specimen Collection Date:</b> 01/06/2020<b><b>ELR Indicator:</b>Y<b><b>Resulted Test:</b> RPR Qualitative Stat<b><b>Coded Result:</b> <b><b>Numeric Result:</b> <b><b>Text Result:</b> Reactive *<b><b>Comments:</b> <b><b><b>Local ID:</b> OBS45657123IN01<b><b>Date Received by PH:</b> 01/15/2020<b><b>Specimen Collection Date:</b> 01/07/2020<b><b>ELR Indicator:</b>Y<b><b>Resulted Test:</b> RPR Quantitative Stat<b><b>Coded Result:</b> <b><b>Numeric Result:</b> <b><b>Text Result:</b> 1:64<b><b>Comments:</b> <b><b><b>Local ID:</b> OB456874123N01<b><b>Date Received by PH:</b> 03/13/2020<b><b>Specimen Collection Date:</b> 01/05/2020<b><b>ELR Indicator:</b>Y<b><b>Resulted Test:</b> FTA<b><b>Coded Result:</b> <b><b>Numeric Result:</b> <b><b>Text Result:</b> Reactive *<b><b>Comments:</b> <b><b>

;

 

data lab2;
set lab;
/* Extract the required information from the LABORATORY_INFORMATION column */
LocalID=prxmatch("/(?<=LocalID:)(.*?)(?=\r\n)/", LABORATORY_INFORMATION);
DateRcvdByPH = prxmatch("/(?<=DateRcvdByPH:)(.*?)(?=\r\n)/", LABORATORY_INFORMATION);
SpecCollDt = prxmatch("/(?<=SpecCollDt:)(.*?)(?=\r\n)/", LABORATORY_INFORMATION);
ELR_IND = prxmatch("/(?<=ELR_IND:)(.*?)(?=\r\n)/", LABORATORY_INFORMATION);
ResultedTest = prxmatch("/(?<=ResultedTest:)(.*?)(?=\r\n)/", LABORATORY_INFORMATION);
CodedResult = prxmatch("/(?<=CodedResult:)(.*?)(?=\r\n)/", LABORATORY_INFORMATION);
NumericResult = prxmatch("/(?<=NumericResult:)(.*?)(?=\r\n)/", LABORATORY_INFORMATION);
TextResult = prxmatch("/(?<=TextResult:)(.*?)(?=\r\n)/", LABORATORY_INFORMATION);
Comments = prxmatch("/(?<=Comments:)(.*?)$/", LABORATORY_INFORMATION);
/* Trim and squeeze the extracted values */
LocalID = trim(compbl(LocalID));
DateRcvdByPH = trim(compbl(DateRcvdByPH));
SpecCollDt = trim(compbl(SpecCollDt));
ELR_IND = trim(compbl(ELR_IND));
ResultedTest = trim(compbl(ResultedTest));
CodedResult = trim(compbl(CodedResult));
NumericResult = trim(compbl(NumericResult));
TextResult = trim(compbl(TextResult));
Comments = trim(compbl(Comments));
/* Drop the original LABORATORY_INFORMATION column */
drop LABORATORY_INFORMATION;
run;

 

Desired Result

Row 1

LocalID:

DateRcdByPH:

SpecCollDt:

ELR_IND:

ResultedTest:

CodedResult:

NumericResult:

TextResults:

Comments:

 

OBS456571123N01

1/15/2020

1/6/2020

Y

RPR Qualitative Stat

  

Reactive *

 

Row 2

LocalID:

DateRcdByPH:

SpecCollDt:

ELR_IND:

ResultedTest:

CodedResult:

NumericResult:

TextResults:

Comments:

 

OBS45657123IN01

1/15/2020

1/7/2020

Y

RPR Quantitative Stat

  

0.0861111

 

Row 3

LocalID:

DateRcdByPH:

SpecCollDt:

ELR_IND:

ResultedTest:

CodedResult:

NumericResult:

TextResults:

Comments:

 

OB456874123N01

3/13/2020

1/5/2020

Y

FTA

  

Reactive *

 
Mazi
Quartz | Level 8
data lab;
infile datalines truncover ;
input LABORATORY_INFORMATION $500.;
datalines;
<b>Local ID:</b> OBS456571123N01<b><b>Date Received by PH:</b> 01/15/2020<b><b>Specimen Collection Date:</b> 01/06/2020<b><b>ELR Indicator:</b>Y<b><b>Resulted Test:</b> RPR Qualitative Stat<b><b>Coded Result:</b> <b><b>Numeric Result:</b> <b><b>Text Result:</b> Reactive *<b><b>Comments:</b> <b><b>
<b>Local ID:</b> OBS45657123IN01<b><b>Date Received by PH:</b> 01/15/2020<b><b>Specimen Collection Date:</b> 01/07/2020<b><b>ELR Indicator:</b>Y<b><b>Resulted Test:</b> RPR Quantitative Stat<b><b>Coded Result:</b> <b><b>Numeric Result:</b> <b><b>Text Result:</b> 1:64<b><b>Comments:</b> <b><b>
<b>Local ID:</b> OB456874123N01<b><b>Date Received by PH:</b> 03/13/2020<b><b>Specimen Collection Date:</b> 01/05/2020<b><b>ELR Indicator:</b>Y<b><b>Resulted Test:</b> FTA<b><b>Coded Result:</b> <b><b>Numeric Result:</b> <b><b>Text Result:</b> Reactive *<b><b>Comments:</b> <b><b>
;

run;

data lab2;
	set lab;
	start=1;
	array columns [9] $200 localid datercdbyph speccolldt elr_ind resultedtest  codedresult  numericresult  textresults  comments;
	do index=1 to 9;
		temp1 = find(laboratory_information, "</b>", start);
		temp2 = find(laboratory_information, "<b>", temp1);
		columns[index] = substr(laboratory_information, temp1 + 4, temp2 - temp1 - 4);
		start=temp2;
	end;
run;

Can you give this a try?

Ariel_3
Fluorite | Level 6

Thank you! That worked pulling the first lab result out from the variable. Is there a way to get to get all the lab results to do the same? The example I provided has 3 lab results lumped together into one variable/row; they are not 3 separate values/rows. Please let me know.

Mazi
Quartz | Level 8
data lab;
infile datalines truncover ;
input LABORATORY_INFORMATION $32767.;
datalines;
<b>Local ID:</b> OBS456571123N01<b><b>Date Received by PH:</b> 01/15/2020<b><b>Specimen Collection Date:</b> 01/06/2020<b><b>ELR Indicator:</b>Y<b><b>Resulted Test:</b> RPR Qualitative Stat<b><b>Coded Result:</b> <b><b>Numeric Result:</b> <b><b>Text Result:</b> Reactive *<b><b>Comments:</b> <b><b><b>Local ID:</b> OBS45657123IN01<b><b>Date Received by PH:</b> 01/15/2020<b><b>Specimen Collection Date:</b> 01/07/2020<b><b>ELR Indicator:</b>Y<b><b>Resulted Test:</b> RPR Quantitative Stat<b><b>Coded Result:</b> <b><b>Numeric Result:</b> <b><b>Text Result:</b> 1:64<b><b>Comments:</b> <b><b><b>Local ID:</b> OB456874123N01<b><b>Date Received by PH:</b> 03/13/2020<b><b>Specimen Collection Date:</b> 01/05/2020<b><b>ELR Indicator:</b>Y<b><b>Resulted Test:</b> FTA<b><b>Coded Result:</b> <b><b>Numeric Result:</b> <b><b>Text Result:</b> Reactive *<b><b>Comments:</b> <b><b>
;
run;

data lab2;
	set lab;
	array columns [9] $200 localid datercdbyph speccolldt elr_ind resultedtest codedresult numericresult textresults comments;
	loop:
	start=1;
	do index=1 to 9;
		temp1 = find(laboratory_information, "</b>", start);
		temp2 = find(laboratory_information, "<b>", temp1);
		columns[index] = substr(laboratory_information, temp1 + 4, temp2 - temp1 - 4);
		start=temp2;
	end;
	output;
	call missing(of columns[*]);
	if length(laboratory_information) > (start + 6) then do;
		laboratory_information = substr(laboratory_information, start + 6);
		goto loop;
	end;
run;

Appologies, I didnt realise it was one row. How about this?

Ariel_3
Fluorite | Level 6

That worked perfectly! Thank you so much!

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 486 views
  • 2 likes
  • 2 in conversation