<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Juletip #16: Using SAS Functions and Perl Regular Expression on a text string in SAS Community Nordic</title>
    <link>https://communities.sas.com/t5/SAS-Community-Nordic/Juletip-16-Using-SAS-Functions-and-Perl-Regular-Expression-on-a/m-p/850035#M402</link>
    <description>&lt;P&gt;SAS Functions can be very useful when working on a text string. This Juletip will show how you can extract digits out of a text string that contains both letters and digits.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;In this example, I will use a variable (text string) with item description that often can appear on scanner data that comes from grocery stores. Usually, this kind of data doesn’t contain separate variables with information about unit or quantity of the different items, but I know that I might find them in the description variable connected to each item.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;In step 1, I am using&amp;nbsp;&lt;SPAN&gt;DATALINES/INPUT statement to read a&amp;nbsp;&lt;/SPAN&gt;variable with item description:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;/* Step 1: The data */

data food0;
  infile datalines dsd truncover;
  input Text:$60.;
  datalines4;
_COCA COLA 6 LITER RINGNES 
SOLO RINGNES 6L. 4BOTTLEDS
500 GRAM FLOUR MØLLERENS 0.5 KG
4.5 KG FLOUR MOLLERENS 4500 GRAM
200 GRAM MOLLERENS 200 GRAM SUGAR
FARIS 1.5DL RINGNES 150 CL
CHEESE WHITE 200G PKN
0.5 KG CHEESE 500 GRAM TINE MEIERIER 500GR.
;;;;

proc print data=food0;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;The output looks like this:&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Bilde1.PNG" style="width: 320px;"&gt;&lt;img src="https://communities.sas.com/t5/image/serverpage/image-id/78612iDAF1FF2FC8139649/image-size/large?v=v2&amp;amp;px=999" role="button" title="Bilde1.PNG" alt="Bilde1.PNG" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;In step 2, I show different functions that could be useful when working on the text string:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;/* Step 2: Useful functions to be aware of when working with text variables */

Data food1;
	set food0;

	*Detecting the presence of alphabetic characters;
	t1=anyalpha(Text);

	*Detecting the presence of digital values;
	t2=anydigit(Text);

	*Return the length of a string;
	t3=length(Text);

	*Extract a substring from a string;
	t4=substr(Text,5,4);

	*Take a part of a string;
	t5=scan(Text,2,'');

	*Replace a word in a string;
	t6=tranwrd(Text,'RINGNES','MACK');
	
run;

proc print data=food1;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&lt;SPAN&gt;This gives the following output:&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Bilde2.PNG" style="width: 734px;"&gt;&lt;img src="https://communities.sas.com/t5/image/serverpage/image-id/78613iB145E691DA7F5B52/image-size/large?v=v2&amp;amp;px=999" role="button" title="Bilde2.PNG" alt="Bilde2.PNG" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;In step 3, I try to extract the different digits in the text string using combinations of various SAS functions. The function &lt;EM&gt;substr&lt;/EM&gt; helps to extract parts of the text string into a new string, hence; I read from the first digit by using the function &lt;EM&gt;anydigit &lt;/EM&gt;and then defining the length using the function length minus&amp;nbsp;&lt;EM&gt;anydigit &lt;/EM&gt;plus one&lt;EM&gt;. &lt;/EM&gt;After that, I use the function &lt;EM&gt;scan&lt;/EM&gt; to read the first part of the new text string that was made and assuming that’s the quantity. Then I repeat &lt;EM&gt;scan &lt;/EM&gt;so I can read the second part, which I assume is the unit. Since there might be more digits in the text string, I also make a new string that contains the rest of the text after the first quantity and unit. In the second part, I do the same that I did in the first part on remaining text, so I can find more quantities and units.&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;/* Step 3: Abstract Quantity and Unit from the string */

	Data food2;
	set food0;
	
	*First part;
	M1=substr(Text,anydigit(Text), length(Text)-anydigit(Text)+1);
	M3=scan(M1,1,'');
	M4=scan(M1,2,'');
	*NB: SUBSTRN= behave in the same way as SUBSTR, but will not issue a NOTE message if it returns a null result;
	R0=substrn(M1,length(M3)+length(M4)+3,length(M1)-(length(M3)+length(M4)+2));
		
	if anyalpha(M3)&amp;gt;0 then do;
	quantity1=substr(M3,anydigit(M3),anyalpha(M3)-1);
	unit1=substr(M3, anyalpha(M3), length(M3)-anyalpha(M3)+1);
	end;
	else do;
	quantity1=M3;
	unit1=M4;
	end;

	*Second part;
	M7=substrn(R0,anydigit(R0),length(R0)-anydigit(R0)+1);
	M8=scan(M7,1,'');
	M9=scan(M7,2,''); 

	if anyalpha(M8)&amp;gt;0 then do;
	quantity2=substrn(M8,anydigit(M8),anyalpha(M8)-1);
	unit2=substrn(M8, anyalpha(M8), length(M8)-anyalpha(M8)+1);
	end;
	else do;
	quantity2=M8;
	unit2=M9;
	end; 

	run;

proc print data=food2(keep=Text quantity1 unit1 quantity2 unit2 );
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;The ‘cleaned’ output looks like this:&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Bilde3.PNG" style="width: 519px;"&gt;&lt;img src="https://communities.sas.com/t5/image/serverpage/image-id/78614i7486641964FAD491/image-size/large?v=v2&amp;amp;px=999" role="button" title="Bilde3.PNG" alt="Bilde3.PNG" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;In step 4, I am using a macro since there is a pattern in step 3. Since I know that there can be at least 3 digits in the text string, I decide to run the macro four times so I can be sure to find all the quantities/units:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;/* Step 4: Making this with a macro */


%macro part(p0,p1);
	
	Data food&amp;amp;p1.;
	set food&amp;amp;p0.;
	

	if &amp;amp;p1.=1 then do;
	
	*FIRST PART;
	M1_&amp;amp;p1.=substr(Text,anydigit(Text), length(Text)-anydigit(Text)+1);
	M3_&amp;amp;p1.=scan(M1_&amp;amp;p1.,1,'');
	M4_&amp;amp;p1.=scan(M1_&amp;amp;p1.,2,'');
	*SUBSTRN= behave in the same way as SUBSTR, but will not issue a NOTE message if it returns a null result;
	R0_&amp;amp;p1.=substrn(M1_&amp;amp;p1.,length(M3_&amp;amp;p1.)+length(M4_&amp;amp;p1.)+3,length(M1_&amp;amp;p1.)-(length(M3_&amp;amp;p1.)+length(M4_&amp;amp;p1.)+2));

	if anyalpha(M3_&amp;amp;p1.)&amp;gt;0 then do;
		quantity&amp;amp;p1.=substr(M3_&amp;amp;p1.,anydigit(M3_&amp;amp;p1.),anyalpha(M3_&amp;amp;p1.)-1);
		unit&amp;amp;p1.=substr(M3_&amp;amp;p1., anyalpha(M3_&amp;amp;p1.),length(M3_&amp;amp;p1.)-anyalpha(M3_&amp;amp;p1.)+1);
		end;
	else do;
		quantity&amp;amp;p1.=M3_&amp;amp;p1.;
		unit&amp;amp;p1.=M4_&amp;amp;p1.;
	end;
    
	
	end; 

	if ( &amp;amp;p1.&amp;gt;1 ) and ( length(R0_&amp;amp;p0.)&amp;gt;0 ) then do;
	

	*MORE PARTS;
	M1_&amp;amp;p1.=substrn(R0_&amp;amp;p0.,anydigit(R0_&amp;amp;p0.),length(R0_&amp;amp;p0.)-anydigit(R0_&amp;amp;p0.)+1);
	M3_&amp;amp;p1.=scan(M1_&amp;amp;p1.,1,'');
	M4_&amp;amp;p1.=scan(M1_&amp;amp;p1.,2,'');
	R0_&amp;amp;p1.= substrn(M1_&amp;amp;p1.,length(M3_&amp;amp;p1.)+length(M4_&amp;amp;p1.)+2,length(M1_&amp;amp;p1.)-(length(M3_&amp;amp;p1.)+length(M4_&amp;amp;p1.)+1));

	if anyalpha(M3_&amp;amp;p1.)&amp;gt;0 then do;
	quantity&amp;amp;p1.=substrn(M3_&amp;amp;p1.,anydigit(M3_&amp;amp;p1.),anyalpha(M3_&amp;amp;p1.)-1);
	unit&amp;amp;p1.=substrn(M3_&amp;amp;p1., anyalpha(M3_&amp;amp;p1.), length(M3_&amp;amp;p1.)-anyalpha(M3_&amp;amp;p1.)+1);
	end;
	else do;
	quantity&amp;amp;p1.=M3_&amp;amp;p1.;
	unit&amp;amp;p1.=M4_&amp;amp;p1.;
	end;  

	end; 

	run;

%mend;
    %part(0,1);
	%part(1,2);
	%part(2,3);
	%part(3,4);


proc print data=food4(keep=Text quantity1 unit1 quantity2 unit2 quantity3 unit3 quantity4 unit4);
run;
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;The result looks like this:&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Bilde4.PNG" style="width: 725px;"&gt;&lt;img src="https://communities.sas.com/t5/image/serverpage/image-id/78615i0A7BBE99F5A97DA0/image-size/large?v=v2&amp;amp;px=999" role="button" title="Bilde4.PNG" alt="Bilde4.PNG" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;In step 5, I try to clean up a bit, so that the items with the same unit is written in the same way. This is also done with a macro:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;/* Step 5: Looks for a word that has the same meaning and replaces them with the same unit */

Data out0;
set food4(keep=Text quantity1 unit1 quantity2 unit2 quantity3 unit3 quantity4 unit4);
run;

 %macro unit(p0, p1);

    data out&amp;amp;p1.;
    set out&amp;amp;p0.;

	if Unit&amp;amp;p1. in ('K','KG','KG.','KILO') then Unit&amp;amp;p1.='KG';
	if Unit&amp;amp;p1. in ('GRAM','G','G.','GR','GR.','GRA','GRAM.') then Unit&amp;amp;p1.='GR';
	if Unit&amp;amp;p1. in ('MILIGRAM') then Unit&amp;amp;p1.='MGR';

    if Unit&amp;amp;p1. in ('C','CL.','CL') then Unit&amp;amp;p1.='CL';
    if Unit&amp;amp;p1. in ('LTR','L','L.','L..','LT','LIT','LITER','LITRE','LTR.','LITE') then Unit&amp;amp;p1.='LTR';
    if Unit&amp;amp;p1. in ('ML.','MLT','ML..','ML','MLITER','MILITER') then Unit&amp;amp;p1.='MLT';

    run;

    %mend;

    %unit(0,1);
    %unit(1,2);
    %unit(2,3);
	%unit(3,4);

proc print data=out4(keep=Text quantity1 unit1 quantity2 unit2 quantity3 unit3 quantity4 unit4);
run;
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;The output looks like this:&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Bilde5.PNG" style="width: 709px;"&gt;&lt;img src="https://communities.sas.com/t5/image/serverpage/image-id/78616i3503A3D3490963FD/image-size/large?v=v2&amp;amp;px=999" role="button" title="Bilde5.PNG" alt="Bilde5.PNG" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;The work might not be finished, but the power of functions might have been proven. In the last step, I want to show that almost the same result could be achieved by Perl Regular Expression’s in the following way: &amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;/* Step 6: Using Perl regular expression */

data food6;
	set food0;

	*Create Perl regular expression's that reads one digit or more (+), the digit could contain dot(or comma);
	Expression1='/([\d.]+)/';
		

	*The PRXPARSE function returns a pattern identifier number that is used by other PRX functions and call routines;
	PatternId1=prxparse(Expression1);
	
	*Use the CALL PRXSUBSTR routine to create a MyStart and MyLength columns, looks for the expression (PatternID) in the Text column;
	call prxsubstr(PatternID1,Text,MyStart1,MyLength1);
	
	* Use the SUBSTR function to create a column that contain the unit starting on G and the quantity before, based on the MyStart and MyLength columns;
	Unit1=substrn(Text,MyStart1,MyLength1);
	extra=substrn(Text,MyStart1+MyLength1,length(text)+1);
	Quantity1=scan(extra,1,'');
	R0=substr(extra,length(Quantity1)+1,length(Text));
run;

proc print data=food6 (drop=extra);run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;This is the results:&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Bilde6.PNG" style="width: 671px;"&gt;&lt;img src="https://communities.sas.com/t5/image/serverpage/image-id/78617iA7C8EDEAC2C9B864/image-size/large?v=v2&amp;amp;px=999" role="button" title="Bilde6.PNG" alt="Bilde6.PNG" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;Of course, you could also make a macro with the suggestion in step 6, so you are able to fetch several digits from the text string.&lt;/P&gt;
&lt;P&gt;I will not recommend which methods that you should use, but please give comments of which method that you prefer.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I wish you all a Merry Christmas and Happy New Year!&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Pia74_0-1671175654253.png" style="width: 400px;"&gt;&lt;img src="https://communities.sas.com/t5/image/serverpage/image-id/78618i0966B21E98911196/image-size/medium?v=v2&amp;amp;px=400" role="button" title="Pia74_0-1671175654253.png" alt="Pia74_0-1671175654253.png" /&gt;&lt;/span&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Pia74_1-1671175671395.png" style="width: 0px;"&gt;&lt;img src="https://communities.sas.com/t5/image/serverpage/image-id/78619iC495E22EFB24C25B/image-size/small?v=v2&amp;amp;px=200" width="0" height="0" role="button" title="Pia74_1-1671175671395.png" alt="Pia74_1-1671175671395.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Fri, 16 Dec 2022 14:10:33 GMT</pubDate>
    <dc:creator>PiaRønnevik</dc:creator>
    <dc:date>2022-12-16T14:10:33Z</dc:date>
    <item>
      <title>Juletip #16: Using SAS Functions and Perl Regular Expression on a text string</title>
      <link>https://communities.sas.com/t5/SAS-Community-Nordic/Juletip-16-Using-SAS-Functions-and-Perl-Regular-Expression-on-a/m-p/850035#M402</link>
      <description>&lt;P&gt;SAS Functions can be very useful when working on a text string. This Juletip will show how you can extract digits out of a text string that contains both letters and digits.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;In this example, I will use a variable (text string) with item description that often can appear on scanner data that comes from grocery stores. Usually, this kind of data doesn’t contain separate variables with information about unit or quantity of the different items, but I know that I might find them in the description variable connected to each item.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;In step 1, I am using&amp;nbsp;&lt;SPAN&gt;DATALINES/INPUT statement to read a&amp;nbsp;&lt;/SPAN&gt;variable with item description:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;/* Step 1: The data */

data food0;
  infile datalines dsd truncover;
  input Text:$60.;
  datalines4;
_COCA COLA 6 LITER RINGNES 
SOLO RINGNES 6L. 4BOTTLEDS
500 GRAM FLOUR MØLLERENS 0.5 KG
4.5 KG FLOUR MOLLERENS 4500 GRAM
200 GRAM MOLLERENS 200 GRAM SUGAR
FARIS 1.5DL RINGNES 150 CL
CHEESE WHITE 200G PKN
0.5 KG CHEESE 500 GRAM TINE MEIERIER 500GR.
;;;;

proc print data=food0;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;The output looks like this:&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Bilde1.PNG" style="width: 320px;"&gt;&lt;img src="https://communities.sas.com/t5/image/serverpage/image-id/78612iDAF1FF2FC8139649/image-size/large?v=v2&amp;amp;px=999" role="button" title="Bilde1.PNG" alt="Bilde1.PNG" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;In step 2, I show different functions that could be useful when working on the text string:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;/* Step 2: Useful functions to be aware of when working with text variables */

Data food1;
	set food0;

	*Detecting the presence of alphabetic characters;
	t1=anyalpha(Text);

	*Detecting the presence of digital values;
	t2=anydigit(Text);

	*Return the length of a string;
	t3=length(Text);

	*Extract a substring from a string;
	t4=substr(Text,5,4);

	*Take a part of a string;
	t5=scan(Text,2,'');

	*Replace a word in a string;
	t6=tranwrd(Text,'RINGNES','MACK');
	
run;

proc print data=food1;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&lt;SPAN&gt;This gives the following output:&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Bilde2.PNG" style="width: 734px;"&gt;&lt;img src="https://communities.sas.com/t5/image/serverpage/image-id/78613iB145E691DA7F5B52/image-size/large?v=v2&amp;amp;px=999" role="button" title="Bilde2.PNG" alt="Bilde2.PNG" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;In step 3, I try to extract the different digits in the text string using combinations of various SAS functions. The function &lt;EM&gt;substr&lt;/EM&gt; helps to extract parts of the text string into a new string, hence; I read from the first digit by using the function &lt;EM&gt;anydigit &lt;/EM&gt;and then defining the length using the function length minus&amp;nbsp;&lt;EM&gt;anydigit &lt;/EM&gt;plus one&lt;EM&gt;. &lt;/EM&gt;After that, I use the function &lt;EM&gt;scan&lt;/EM&gt; to read the first part of the new text string that was made and assuming that’s the quantity. Then I repeat &lt;EM&gt;scan &lt;/EM&gt;so I can read the second part, which I assume is the unit. Since there might be more digits in the text string, I also make a new string that contains the rest of the text after the first quantity and unit. In the second part, I do the same that I did in the first part on remaining text, so I can find more quantities and units.&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;/* Step 3: Abstract Quantity and Unit from the string */

	Data food2;
	set food0;
	
	*First part;
	M1=substr(Text,anydigit(Text), length(Text)-anydigit(Text)+1);
	M3=scan(M1,1,'');
	M4=scan(M1,2,'');
	*NB: SUBSTRN= behave in the same way as SUBSTR, but will not issue a NOTE message if it returns a null result;
	R0=substrn(M1,length(M3)+length(M4)+3,length(M1)-(length(M3)+length(M4)+2));
		
	if anyalpha(M3)&amp;gt;0 then do;
	quantity1=substr(M3,anydigit(M3),anyalpha(M3)-1);
	unit1=substr(M3, anyalpha(M3), length(M3)-anyalpha(M3)+1);
	end;
	else do;
	quantity1=M3;
	unit1=M4;
	end;

	*Second part;
	M7=substrn(R0,anydigit(R0),length(R0)-anydigit(R0)+1);
	M8=scan(M7,1,'');
	M9=scan(M7,2,''); 

	if anyalpha(M8)&amp;gt;0 then do;
	quantity2=substrn(M8,anydigit(M8),anyalpha(M8)-1);
	unit2=substrn(M8, anyalpha(M8), length(M8)-anyalpha(M8)+1);
	end;
	else do;
	quantity2=M8;
	unit2=M9;
	end; 

	run;

proc print data=food2(keep=Text quantity1 unit1 quantity2 unit2 );
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;The ‘cleaned’ output looks like this:&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Bilde3.PNG" style="width: 519px;"&gt;&lt;img src="https://communities.sas.com/t5/image/serverpage/image-id/78614i7486641964FAD491/image-size/large?v=v2&amp;amp;px=999" role="button" title="Bilde3.PNG" alt="Bilde3.PNG" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;In step 4, I am using a macro since there is a pattern in step 3. Since I know that there can be at least 3 digits in the text string, I decide to run the macro four times so I can be sure to find all the quantities/units:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;/* Step 4: Making this with a macro */


%macro part(p0,p1);
	
	Data food&amp;amp;p1.;
	set food&amp;amp;p0.;
	

	if &amp;amp;p1.=1 then do;
	
	*FIRST PART;
	M1_&amp;amp;p1.=substr(Text,anydigit(Text), length(Text)-anydigit(Text)+1);
	M3_&amp;amp;p1.=scan(M1_&amp;amp;p1.,1,'');
	M4_&amp;amp;p1.=scan(M1_&amp;amp;p1.,2,'');
	*SUBSTRN= behave in the same way as SUBSTR, but will not issue a NOTE message if it returns a null result;
	R0_&amp;amp;p1.=substrn(M1_&amp;amp;p1.,length(M3_&amp;amp;p1.)+length(M4_&amp;amp;p1.)+3,length(M1_&amp;amp;p1.)-(length(M3_&amp;amp;p1.)+length(M4_&amp;amp;p1.)+2));

	if anyalpha(M3_&amp;amp;p1.)&amp;gt;0 then do;
		quantity&amp;amp;p1.=substr(M3_&amp;amp;p1.,anydigit(M3_&amp;amp;p1.),anyalpha(M3_&amp;amp;p1.)-1);
		unit&amp;amp;p1.=substr(M3_&amp;amp;p1., anyalpha(M3_&amp;amp;p1.),length(M3_&amp;amp;p1.)-anyalpha(M3_&amp;amp;p1.)+1);
		end;
	else do;
		quantity&amp;amp;p1.=M3_&amp;amp;p1.;
		unit&amp;amp;p1.=M4_&amp;amp;p1.;
	end;
    
	
	end; 

	if ( &amp;amp;p1.&amp;gt;1 ) and ( length(R0_&amp;amp;p0.)&amp;gt;0 ) then do;
	

	*MORE PARTS;
	M1_&amp;amp;p1.=substrn(R0_&amp;amp;p0.,anydigit(R0_&amp;amp;p0.),length(R0_&amp;amp;p0.)-anydigit(R0_&amp;amp;p0.)+1);
	M3_&amp;amp;p1.=scan(M1_&amp;amp;p1.,1,'');
	M4_&amp;amp;p1.=scan(M1_&amp;amp;p1.,2,'');
	R0_&amp;amp;p1.= substrn(M1_&amp;amp;p1.,length(M3_&amp;amp;p1.)+length(M4_&amp;amp;p1.)+2,length(M1_&amp;amp;p1.)-(length(M3_&amp;amp;p1.)+length(M4_&amp;amp;p1.)+1));

	if anyalpha(M3_&amp;amp;p1.)&amp;gt;0 then do;
	quantity&amp;amp;p1.=substrn(M3_&amp;amp;p1.,anydigit(M3_&amp;amp;p1.),anyalpha(M3_&amp;amp;p1.)-1);
	unit&amp;amp;p1.=substrn(M3_&amp;amp;p1., anyalpha(M3_&amp;amp;p1.), length(M3_&amp;amp;p1.)-anyalpha(M3_&amp;amp;p1.)+1);
	end;
	else do;
	quantity&amp;amp;p1.=M3_&amp;amp;p1.;
	unit&amp;amp;p1.=M4_&amp;amp;p1.;
	end;  

	end; 

	run;

%mend;
    %part(0,1);
	%part(1,2);
	%part(2,3);
	%part(3,4);


proc print data=food4(keep=Text quantity1 unit1 quantity2 unit2 quantity3 unit3 quantity4 unit4);
run;
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;The result looks like this:&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Bilde4.PNG" style="width: 725px;"&gt;&lt;img src="https://communities.sas.com/t5/image/serverpage/image-id/78615i0A7BBE99F5A97DA0/image-size/large?v=v2&amp;amp;px=999" role="button" title="Bilde4.PNG" alt="Bilde4.PNG" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;In step 5, I try to clean up a bit, so that the items with the same unit is written in the same way. This is also done with a macro:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;/* Step 5: Looks for a word that has the same meaning and replaces them with the same unit */

Data out0;
set food4(keep=Text quantity1 unit1 quantity2 unit2 quantity3 unit3 quantity4 unit4);
run;

 %macro unit(p0, p1);

    data out&amp;amp;p1.;
    set out&amp;amp;p0.;

	if Unit&amp;amp;p1. in ('K','KG','KG.','KILO') then Unit&amp;amp;p1.='KG';
	if Unit&amp;amp;p1. in ('GRAM','G','G.','GR','GR.','GRA','GRAM.') then Unit&amp;amp;p1.='GR';
	if Unit&amp;amp;p1. in ('MILIGRAM') then Unit&amp;amp;p1.='MGR';

    if Unit&amp;amp;p1. in ('C','CL.','CL') then Unit&amp;amp;p1.='CL';
    if Unit&amp;amp;p1. in ('LTR','L','L.','L..','LT','LIT','LITER','LITRE','LTR.','LITE') then Unit&amp;amp;p1.='LTR';
    if Unit&amp;amp;p1. in ('ML.','MLT','ML..','ML','MLITER','MILITER') then Unit&amp;amp;p1.='MLT';

    run;

    %mend;

    %unit(0,1);
    %unit(1,2);
    %unit(2,3);
	%unit(3,4);

proc print data=out4(keep=Text quantity1 unit1 quantity2 unit2 quantity3 unit3 quantity4 unit4);
run;
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;The output looks like this:&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Bilde5.PNG" style="width: 709px;"&gt;&lt;img src="https://communities.sas.com/t5/image/serverpage/image-id/78616i3503A3D3490963FD/image-size/large?v=v2&amp;amp;px=999" role="button" title="Bilde5.PNG" alt="Bilde5.PNG" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;The work might not be finished, but the power of functions might have been proven. In the last step, I want to show that almost the same result could be achieved by Perl Regular Expression’s in the following way: &amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;/* Step 6: Using Perl regular expression */

data food6;
	set food0;

	*Create Perl regular expression's that reads one digit or more (+), the digit could contain dot(or comma);
	Expression1='/([\d.]+)/';
		

	*The PRXPARSE function returns a pattern identifier number that is used by other PRX functions and call routines;
	PatternId1=prxparse(Expression1);
	
	*Use the CALL PRXSUBSTR routine to create a MyStart and MyLength columns, looks for the expression (PatternID) in the Text column;
	call prxsubstr(PatternID1,Text,MyStart1,MyLength1);
	
	* Use the SUBSTR function to create a column that contain the unit starting on G and the quantity before, based on the MyStart and MyLength columns;
	Unit1=substrn(Text,MyStart1,MyLength1);
	extra=substrn(Text,MyStart1+MyLength1,length(text)+1);
	Quantity1=scan(extra,1,'');
	R0=substr(extra,length(Quantity1)+1,length(Text));
run;

proc print data=food6 (drop=extra);run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;This is the results:&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Bilde6.PNG" style="width: 671px;"&gt;&lt;img src="https://communities.sas.com/t5/image/serverpage/image-id/78617iA7C8EDEAC2C9B864/image-size/large?v=v2&amp;amp;px=999" role="button" title="Bilde6.PNG" alt="Bilde6.PNG" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;Of course, you could also make a macro with the suggestion in step 6, so you are able to fetch several digits from the text string.&lt;/P&gt;
&lt;P&gt;I will not recommend which methods that you should use, but please give comments of which method that you prefer.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I wish you all a Merry Christmas and Happy New Year!&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Pia74_0-1671175654253.png" style="width: 400px;"&gt;&lt;img src="https://communities.sas.com/t5/image/serverpage/image-id/78618i0966B21E98911196/image-size/medium?v=v2&amp;amp;px=400" role="button" title="Pia74_0-1671175654253.png" alt="Pia74_0-1671175654253.png" /&gt;&lt;/span&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Pia74_1-1671175671395.png" style="width: 0px;"&gt;&lt;img src="https://communities.sas.com/t5/image/serverpage/image-id/78619iC495E22EFB24C25B/image-size/small?v=v2&amp;amp;px=200" width="0" height="0" role="button" title="Pia74_1-1671175671395.png" alt="Pia74_1-1671175671395.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 16 Dec 2022 14:10:33 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Community-Nordic/Juletip-16-Using-SAS-Functions-and-Perl-Regular-Expression-on-a/m-p/850035#M402</guid>
      <dc:creator>PiaRønnevik</dc:creator>
      <dc:date>2022-12-16T14:10:33Z</dc:date>
    </item>
  </channel>
</rss>

