BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
sas21
Calcite | Level 5

I need to find the most efficient way possible to extract 6 test results that are populated in one text field.  The variable is called RESULT_NOTES and the general format of the variable looks something like this (patterns vary a little for each record- just an example of 1, also all of the numbers are made up in this example):


  RESULT COMMENT(S): ---------------------------------------------------------------------~ TEST NAME RESULT ALERT UOM PERFORMED AT~---------------------------------------------------------------------~Perfluorobutanesulfonic Acid 10 ng/mL 01~Synonym(s): PFBS~Population reference interval derived from BBC Laboratories~data (n=300) is usually less than 20 ng/mL~(90% CI, <0.0.050 - 0.23 ng/mL) (97.5th percentile)~General U.S. population from CDC-NHANES (2013-2014)~(n=3000) (isomers not described) is typically~below 0.1 ng/mL (95th percentile)~ -----~Perfluoroheptanoic Acid 8 ng/mL 01~Synonym(s): PFHpA~Population reference interval derived from BBC Laboratories~data (n=300) is usually less than 97 ng/mL~(90% CI, 0.25 - 0.73 ng/mL) (97.5th percentile)~General U.S. population from CDC-NHANES (2013-2014)~(n=2168) (isomers not described) is typically~below 0.20 ng/mL (95% CI, 0.10 - 0.20 ng/mL)~(95th percentile)~ -----~Perfluorohexanesulfonic Acid 16 ng/mL 01~Synonym(s): PFHxS~Population reference interval derived from BBC Laboratories~data (n=300) is usually less than 30  ng/mL~(90% CI, 4.1 - 17 ng/mL) (97.5th percentile)~General U.S. population from CDC-NHANES (2015-2016)~(n=1993) (isomers not described) is typically~below 4.9 ng/mL (95% CI, 4.1 - 5.8 ng/mL)~(95th percentile)~ -----~Perfluorooctanoic Acid 20 ng/mL 01~Synonym(s): PFOA; FC-143 Component~Population reference interval derived from~BBC Laboratories data (n=300) is usually less than 4.1 ng/mL~(90% CI, 3.3 - 8.0 ng/mL) (97.5th percentile)~General U.S. population from CDC-NHANES (2015-2016)~(n=1993) for the linear isomer is typically below~8 ng/mL (95% CI,7.6-10.3 ng/mL) (95th percentile)~below 1.9 ng/mL (95% CI, 1.5 - 2.2 ng/mL)~(95th percentile)~ -----~Perfluorononanoic Acid 5 ng/mL 01~Synonym(s): PFNA~Population reference interval derived from BBC Laboratories~data (n=300) is usually less than 1.4 ng/mL~(90% CI, 1.2 - 2.3 ng/mL) (97.5th percentile)~General U.S. population from CDC-NHANES (2015-2016)~(n=1993) (isomers not described) is typically~below 1.9 ng/mL (95% CI, 1.5 - 2.2 ng/mL)~(95th percentile)~ -----~Perfluorooctanesulfonic Acid 12 ng/mL 01~Please be advised that this report contains fictitious~results to be used for interface program validation.~This report does not contain real patient results.~Synonym(s): PFOS~Population reference interval derived from BBC Laboratories~data (n=300) is usually less than 12 ng/mL~(90% CI, 7.7 - 15 ng/mL) (97.5th percentile)~General U.S. population from CDC-NHANES (2015-2016)~(n=1993) for the linear isomer is typically~below 13 ng/mL (95% CI, 10 - 18 ng/mL)~(95th percentile)~This test was developed and its performance characteristics~determined by BBC Laboratories. It has not been cleared or approved~by the US Food and Drug Administration.~ -----~---------------------------------------------------------------------~ ALERT/ABNORMAL FLAG LEGEND:~ L= Below Low Normal, H= Above High Normal, LL= Alert Low~ HH= Alert High, <= Panic Low, >= Panic High, A= Abnormal~---------------------------------------------------------------------

 

Ideally, I would need to create a variable for each compound, with the numeric value and then a separate variable for the units an example below:

Perfluorobutanesulfonic_Value     Perfluorobutanesulfonic_Units     etc...…(for each compound bolded above)

       10                                                                    ng/ml

 

Thank you for any help or suggestions!

%MCEPASTEBIN%
%MCEPASTEBIN
1 ACCEPTED SOLUTION

Accepted Solutions
PGStats
Opal | Level 21

A typical strategy:

data wantList;
set have;
stringId = _n_;
do compound = 
    "Perfluorobutanesulfonic Acid", 
    "Perfluoroheptanoic Acid",
    "Perfluorohexanesulfonic Acid";
    pos = find(text, compound, "it");
    if pos > 0 then do;
        pos = pos + length(compound);
        length var $26; /* 32 - length("_units") */
        var = scan(compound, 1);
        value = input(scan(substr(text, pos), 1, " "), best.);
        units = scan(substr(text, pos), 2, " ");
        output;
        end;
    end;
drop text pos;
run;

proc transpose data=wantList out=wantvalues(drop=_name_) suffix=_value;
by stringId;
var value;
id var;
run;

proc transpose data=wantList out=wantunits(drop=_name_) suffix=_units;
by stringId;
var units;
id var;
run;

data want;
merge wantValues wantUnits;
by stringId;
drop stringId;
run;

You need two transpose steps because values are numeric and units are character.

 

PG

View solution in original post

3 REPLIES 3
Shmuel
Garnet | Level 18

May I assume that the prefix Perfluoro appears only at start of the text to be extracted and not in between extracted texts ?

Are All acids and is the extracted text always in a format of "Perfluoro .... Acid <amount> ng/ml" ?

 

If positive then next code (not tested) may help:


data want;
 set have;
     length text1 $32000 
	        searc_for $9
			out_var1 $30
			out_value  8
			out_unit  $5
		;
	flag=1;
	search_for = 'Perfluoro';
	text1 = substr(result_notes,index(result_notes,search_for));
	do until falg=0;
	   out_var1 = scan(text1,1)|| 'Acid';
	   out_value = input(scan(text1,3),best4.);
	   out_unit = scan(text1,4,' .');
	   output; 
	   keep out_var1 out_value out_unit;
	   
	   ix = index(text1,out_unit) + length(out_unit) +1;
	   text1 = substr(text1,index(result_notes,search_for));
	   if text1 = '' then flag=0;
	end;
run;
	   
	 

 

 

SASJedi
Ammonite | Level 13

Here's a start, anyway. This code assumes your data is in a text file named "remarks.txt" located in the SAS default directory.

 

data intermediate;
   drop _:;
   retain __rxID 0;
   length _Found $ 32767 Name $50 Units $ 15 Value 8;
   if _n_=1 then do;
      _rxID=prxparse('/-~(\w+\s?\w*\/?\w*\s)+01~/');
      put 'NOTE: ' _rxID=;
   end;
   infile "remarks.txt";
   input;
   _start = 1;
   _stop = length(_infile_);
   call prxnext(_rxID, _start, _stop, _infile_, _position, _length);
      do while (_position > 0);
         _found = substr(_infile_, _position+2, _length-3);
         Units=scan(_found,-2,' ');
         Value=input(scan(_found,-3,' '),10.);
         Name=substr(_found,1,anydigit(_found)-2);
         output;
         call prxnext(_rxID, _start, _stop, _infile_, _position, _length);
      end;
run;

%macro Fixnames;
proc sql noprint;
select distinct
       Name
      ,cats(scan(Name,1),'_Value')
      ,cats(scan(Name,1),'_Units')
  into :Name1- , :val1- , :unit1-
  from intermediate
  order by 1
;
quit;

data want;
  retain 
     %do i=1 %to &sqlobs;
     &&val&i 0 
     &&unit&i '          '
     %end;
  ;
  set intermediate end=last;
  %do i=1 %to &sqlobs;
  if Name="&&name&i" then do;
     &&val&i=value;
     &&unit&i=units;
  end;
  %end;
  if last then output;
run;
%mend;

%fixnames

The program produces this result:

 

Obs Perfluorobutanesulfonic_Value Perfluorobutanesulfonic_Units Perfluoroheptanoic_Value Perfluoroheptanoic_Units Perfluorohexanesulfonic_Value Perfluorohexanesulfonic_Units Perfluorononanoic_Value Perfluorononanoic_Units Perfluorooctanesulfonic_Value Perfluorooctanesulfonic_Units Perfluorooctanoic_Value Perfluorooctanoic_Units
1 10 ng/mL 8 ng/mL 16 ng/mL 5 ng/mL 12 ng/mL 20 ng/mL


 

Check out my Jedi SAS Tricks for SAS Users
PGStats
Opal | Level 21

A typical strategy:

data wantList;
set have;
stringId = _n_;
do compound = 
    "Perfluorobutanesulfonic Acid", 
    "Perfluoroheptanoic Acid",
    "Perfluorohexanesulfonic Acid";
    pos = find(text, compound, "it");
    if pos > 0 then do;
        pos = pos + length(compound);
        length var $26; /* 32 - length("_units") */
        var = scan(compound, 1);
        value = input(scan(substr(text, pos), 1, " "), best.);
        units = scan(substr(text, pos), 2, " ");
        output;
        end;
    end;
drop text pos;
run;

proc transpose data=wantList out=wantvalues(drop=_name_) suffix=_value;
by stringId;
var value;
id var;
run;

proc transpose data=wantList out=wantunits(drop=_name_) suffix=_units;
by stringId;
var units;
id var;
run;

data want;
merge wantValues wantUnits;
by stringId;
drop stringId;
run;

You need two transpose steps because values are numeric and units are character.

 

PG

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 1112 views
  • 2 likes
  • 4 in conversation