Re: Pattern checking for decimal numbers

msr1 · Posted 02-22-2021 11:42 PM

Hi Everyone,

I have a business problem where I wanted to perform patter check on the column.

I wanted to check if the values are in a particular format like decimal20.5. It should fail if the format is different from it.

data have;

input col1;

datalines;

0.01238788

61.15612

123.12

12.1244

12345.125665

;

run;

- > using below solution:

%let pattern=^[0-9]\d{1,20}(\.\d{1,10})?%?$;

data want;

set have;

pattern_val=pxparse("/&pattern./");

v1=prxmatch(%nrbquote(pattern_val),col1);

if prxmatch(%nrbquote(pattern),col1) gt 0 then pass=0;

else pass=1;

run;

0- means the value is in decimal 20.5 format

1- other than that

Note:

I am not much familiar with regex and took the expression from internet.

I have to use the snippet in my other macro codes so need to make it dynamic with pattern code.

Any help is very much appreciated.

Thanks,

ChrisNZ · Posted 02-23-2021 12:05 AM

No too sure why you use macro functions.

Also note that [0-9] is the same as \d

This is simpler:

%let pattern=^\d{1,20}(\.\d{1,10})?%?$;
data WANT;
  set HAVE;
  PASS = prxmatch("/&pattern/",strip(COL1)) > 0;
run;

What doesn't work?

Lastly, Use the appropriate icon to paste your SAS code.

High-Performance SAS Coding - Third Edition

Tom · Posted 02-23-2021 12:09 AM

Regular expressions are for strings, not numbers.

Did you mean to create your original COL1 variable as a character string?

data have;
  input col1 $40. ;
datalines;
0.01238788
61.15612
123.12
12.1244
12345.125665
;

What do you think the regular expression you have means?

^[0-9]\d{1,20}(\.\d{1,10})?%?$

What does DECIMAL20.5 mean? Is that like the SAS format 20.5 which means 20 character long with the last 6 being the decimal point and 5 digits to the right of the decimal point?

msr1 · Posted 02-23-2021 12:42 AM

Yes it is format 20.5 with 5 digits after decimal.
I am not sure about the regex, I took it from the internet.
My concern is, I need to perform validation on my data, and need to verify it all the values are in format 20.5, anything apart from that will fail.
Its not on one table, have to do it for multiple table

Tom · Posted 02-23-2021 08:37 AM

@msr1 wrote:
Yes it is format 20.5 with 5 digits after decimal.
I am not sure about the regex, I took it from the internet.
My concern is, I need to perform validation on my data, and need to verify it all the values are in format 20.5, anything apart from that will fail.
Its not on one table, have to do it for multiple table

Are your fields numeric? Is the 20.5 the SAS format or the Teradata field definition. They are different. A DECIMAL(20,5) field in Teradata has space for 20 digits of which 5 are after the decimal place. A SAS format of 20.5 has space for just 19 digits or which 5 are after the decimal place. Also note that SAS stores all numbers as floating point numbers and so it cannot store 30 or even 20 digits precisely. If you actually had a field defined as DECIMAL(30,5) in Teradata you would either need to convert it to FLOAT and lose some of the decimal digit precision or convert it to character and treat it as such in SAS.

To test if a number you have it SAS will "fit" into a specific DECIMAL field in Teradata you probably want to use arithmetic and not regular expressions to check. So if the maximum number of decimal places is 5 then round the value to that many decimal places. If the maximum number of decimal places to the left of the decimal place is 15 then make sure the magnitude of the number is smaller than 10**15.

qoit · Posted 02-23-2021 12:20 AM

If all you are wanting to do is flag values with values upto 5 decimal places, then below should suffice?

Also, the DATA step with PRX functions have errors (example pattern_val is listed as pattern in the IF statements).

data have;
	input col1;
	decimal = length(scan(strip(put(col1,best20.)),2,"."));

	if decimal >= 5 then
		pass = 1;
	else pass=0;
	drop decimal;
	datalines;
0.01238788
61.15612
123.12
12.1244
12345.125665
;
run;

/* Do not need the below */
%let pattern=^[0-9]\d{1,20}(\.\d{1,10})?%?$;

data want;
	set have;
	pattern_val=prxparse("/&pattern./");
	v1=prxmatch(%nrbquote(pattern_val),col1);

	if prxmatch(%nrbquote(pattern_val),col1) gt 0 then
		pass=0;
	else pass=1;
run;

msr1 · Posted 02-23-2021 12:44 AM

its not about flagging with values upto 5 decimal places,
will have to apply the query to teradata in order to validate the values, Some tabla can have other format as well, like decimal30.5 decimal 20.3 etc ,WIll tweak it accordingly.
Thanks

qoit · Posted 02-23-2021 01:23 AM

Okay see below that flags any value that is ^ length 20 and ^decimal 5:

data have;
	input col1;
	col = length(put(col1,best20.));
	decimal = length(scan(strip(put(col1,best20.)),2,"."));

	if col = 20 and decimal = 5 then
		pass = 1;
	else pass=0;
	drop decimal col;
	datalines;
0.01238788
61.15612
123.12
12.1244
12345.125665
;
run;

ChrisNZ · Posted 02-23-2021 02:53 PM

No negative numbers?

High-Performance SAS Coding - Third Edition

SAS Training: Just a Click Away