BookmarkSubscribeRSS Feed
msr1
Calcite | Level 5

Hi Everyone,

I have a business problem where I wanted to perform patter check on the column. 

I wanted to check if the values are in a particular format like decimal20.5. It should fail if the format is different from it.

data have;

input col1;

datalines;

0.01238788

61.15612

123.12

12.1244

12345.125665

;

run;

 

- > using below solution:

 

%let pattern=^[0-9]\d{1,20}(\.\d{1,10})?%?$;

data want;

set have;

pattern_val=pxparse("/&pattern./");

v1=prxmatch(%nrbquote(pattern_val),col1);

if prxmatch(%nrbquote(pattern),col1) gt 0 then pass=0;

else pass=1;

run;

0- means the value is in decimal 20.5 format

1- other than that

 

 

Note:

I am not much familiar with regex and took the expression from internet.

I have to use the snippet in my other macro codes so need to make it dynamic with pattern code.

 

Any help is very much appreciated.

Thanks,

 

 

 

 

 

 

8 REPLIES 8
ChrisNZ
Tourmaline | Level 20

No too sure why you use macro functions.

Also note that [0-9] is the same as \d  

This is simpler:

 

%let pattern=^\d{1,20}(\.\d{1,10})?%?$;
data WANT;
  set HAVE;
  PASS = prxmatch("/&pattern/",strip(COL1)) > 0;
run;

 

What doesn't work?

 

Lastly,   Use the appropriate icon to paste your SAS code.

Tom
Super User Tom
Super User

Regular expressions are for strings, not numbers. 

Did you mean to create your original COL1 variable as a character string?

data have;
  input col1 $40. ;
datalines;
0.01238788
61.15612
123.12
12.1244
12345.125665
;

What do you think the regular expression you have means?

^[0-9]\d{1,20}(\.\d{1,10})?%?$

What does DECIMAL20.5 mean? Is that like the SAS format 20.5 which means 20 character long with the last 6 being the decimal point and 5 digits to the right of the decimal point?

msr1
Calcite | Level 5
Yes it is format 20.5 with 5 digits after decimal.
I am not sure about the regex, I took it from the internet.
My concern is, I need to perform validation on my data, and need to verify it all the values are in format 20.5, anything apart from that will fail.
Its not on one table, have to do it for multiple table
Tom
Super User Tom
Super User

@msr1 wrote:
Yes it is format 20.5 with 5 digits after decimal.
I am not sure about the regex, I took it from the internet.
My concern is, I need to perform validation on my data, and need to verify it all the values are in format 20.5, anything apart from that will fail.
Its not on one table, have to do it for multiple table

Are your fields numeric?  Is the 20.5 the SAS format or the Teradata field definition.  They are different.  A DECIMAL(20,5) field in Teradata has space for 20 digits of which 5 are after the decimal place.  A SAS format of 20.5 has space for just 19 digits or which 5 are after the decimal place.  Also note that SAS stores all numbers as floating point numbers and so it cannot store 30 or even 20 digits precisely.   If you actually had a field defined as DECIMAL(30,5) in Teradata you would either need to convert it to FLOAT and lose some of the decimal digit precision or convert it to character and treat it as such in SAS.

 

To test if a number you have it SAS will "fit" into a specific DECIMAL field in Teradata you probably want to use arithmetic and not regular expressions to check.  So if the maximum number of decimal places is 5 then round the value to that many decimal places.  If the maximum number of decimal places to the left of the decimal place is 15 then make sure the magnitude of the number is smaller than 10**15.

qoit
Pyrite | Level 9

If all you are wanting to do is flag values with values upto 5 decimal places, then below should suffice?

 

Also, the DATA step with PRX functions have errors (example pattern_val is listed as pattern in the IF statements).

 

data have;
	input col1;
	decimal = length(scan(strip(put(col1,best20.)),2,"."));

	if decimal >= 5 then
		pass = 1;
	else pass=0;
	drop decimal;
	datalines;
0.01238788
61.15612
123.12
12.1244
12345.125665
;
run;

/* Do not need the below */
%let pattern=^[0-9]\d{1,20}(\.\d{1,10})?%?$;

data want;
	set have;
	pattern_val=prxparse("/&pattern./");
	v1=prxmatch(%nrbquote(pattern_val),col1);

	if prxmatch(%nrbquote(pattern_val),col1) gt 0 then
		pass=0;
	else pass=1;
run;
msr1
Calcite | Level 5
its not about flagging with values upto 5 decimal places,
will have to apply the query to teradata in order to validate the values, Some tabla can have other format as well, like decimal30.5 decimal 20.3 etc ,WIll tweak it accordingly.
Thanks
qoit
Pyrite | Level 9

Okay see below that flags any value that is ^ length 20 and ^decimal 5:

 

data have;
	input col1;
	col = length(put(col1,best20.));
	decimal = length(scan(strip(put(col1,best20.)),2,"."));

	if col = 20 and decimal = 5 then
		pass = 1;
	else pass=0;
	drop decimal col;
	datalines;
0.01238788
61.15612
123.12
12.1244
12345.125665
;
run;

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 1729 views
  • 0 likes
  • 4 in conversation