BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
SM1
Obsidian | Level 7 SM1
Obsidian | Level 7

Hello,

 

I'm developing syntax to QA a date/time string variable in an 11-character format.The time should be written according to a 24-hour clock (or as military time).

 

If written correctly, the value for 7 June 2017 at 10:30 am would be: 071030JUN17

The value for 7 June 2017 at 10:30 pm would be: 072230JUN17

 

I am not disaggregating the variable into separate time and date variables from the start for reasons that would involve a long explanation. But that option was considered and rejected.

 

Using the date/time value of 7 June 2017 at 10:30 am, here are examples of the most common errors:

 

1. time omitted = 07AUG17

2. leading zero of day omitted = 71030JUN17

3. leading zero of day and time both omitted = 7JUN17

4. June spelled out entirely = 071030JUNE17

5. year written as 4-digits = 071030JUN2017

6. day and time written in reverse order = 103007JUN17

 

and

 

7. 0 (zero) instead of O (letter O) in month: 0710300CT17 - only relevant for dates in October

 

I am trying to use PRXMATCH syntax to differentiate between values that match the correct value pattern and those that do not. Each observation has a unique ReportID. The date/time incident variable (IncDateTime) has no spaces between any of the (as many as) 11 characters/digits.

 

I ran variations on the following basic syntax:

 

data prxmatch_incident;

set incident (keep = ReportID IncDateTime);

position = prxmatch("/\JAN|FEB|MAR|APR|MAY|JUN|JUL|AUG|SEP|OCT|NOV|DEC\d\d/",IncDateTime);

run;

 

This syntax comes close to doing what is needed. Unfortunately, the syntax recognizes the following values similarly and determines position = 3 for each value:

 

07JUN17

07JUN2017

07JUNE17

 

So I need to further limit the paramters of the pattern so that JUN is not the same as JUNE and 17 is not the same as 2017. If I can confidently distinguish between the values by pattern error, I can write syntax to further clean the data automatically.

 

Other "position = prxmatch" statements that were tried, but did not work include:

 

 

position_full = prxmatch("/\d\d\d\d\d\d\JAN|FEB|MAR|APR|MAY|JUN|JUL|AUG|SEP|OCT|NOV|DEC\d\d/",inc);
position_date = prxmatch("/\d\d\[ADFJMNOS]\[ACEOPU]\[BCGLNPRTVY]\d\d/",inc);
position_month2 = prxmatch("/\[ADFJMNOS-ACEOPU-BCGLNPRTVY]\d\d/",inc);

 

Any suggestions to further refine the PRXMATCH statement will be most appreciated!

 

Thank you!

1 ACCEPTED SOLUTION

Accepted Solutions
PGStats
Opal | Level 21

You can push the validation quite far:

 

data test;
  input have_str :$20.;
  pos = prxmatch('/\b(0[1-9]|[12]\d|3[01])([01]\d|2[0-3])([0-5]\d)(JAN|FEB|MAR|APR|MAY|JUN|JUL|AUG|SEP|OCT|NOV|DEC)(\d\d)\b/o',have_str);
datalines;
071030JUN17
072230JUN17
07AUG17
71030JUN17
7JUN17
071030JUNE17
071030JUN2017
103007JUN17
070020JUN17
002007JUN17
;

proc print noobs; run;
PG

View solution in original post

6 REPLIES 6
TomKari
Onyx | Level 15

Oh, I LOVE these problems!

 

If it were me, and if volume weren't a problem, I'd tackle it in stages.

 

Step 1: does my variable meet the format 999999AAA99. If it doesn't, reject it.

Step 2: okay, the broad format is reasonable. Now start looking at the pieces. Do characters 3 through 6 make sense as an hour and a minute? (3-4 between "00" and "24", 5-6 between "00" and "60"). If not, reject it.

Step 3: Are characters 7 through 9 one of your date values? Do characters 10 and 11 fall within your basket of acceptable year values? Are characters 1 to 2 between 01 and 31?

And finally, step 4: do the month and day together constitue a valid date?

 

For doing this kind of thing, the SAS functions are your friend. One of the secret sauce components of SAS is the incredibly rich set of functions that you can use.

 

Tom

Patrick
Opal | Level 21

@SM1

Something like below could do.

data test;
  infile datalines truncover;
  input have_str $20.;
  test= prxmatch('/^[0123]\d[012]\d[012345]\d(JAN|FEB|MAR|APR|MAY|JUN|JUL|AUG|SEP|OCT|NOV|DEC)\d\d$/',strip(have_str));
datalines;
071030JUN17
072230JUN17
07AUG17
71030JUN17
7JUN17
071030JUNE17
071030JUN2017
103007JUN17
run;
SM1
Obsidian | Level 7 SM1
Obsidian | Level 7

@Patrick

 

Your solution also worked - thank you!

PGStats
Opal | Level 21

You can push the validation quite far:

 

data test;
  input have_str :$20.;
  pos = prxmatch('/\b(0[1-9]|[12]\d|3[01])([01]\d|2[0-3])([0-5]\d)(JAN|FEB|MAR|APR|MAY|JUN|JUL|AUG|SEP|OCT|NOV|DEC)(\d\d)\b/o',have_str);
datalines;
071030JUN17
072230JUN17
07AUG17
71030JUN17
7JUN17
071030JUNE17
071030JUN2017
103007JUN17
070020JUN17
002007JUN17
;

proc print noobs; run;
PG
SM1
Obsidian | Level 7 SM1
Obsidian | Level 7

@PG Stats

 

Your solution worked to clearly identify those variables that met the pattern of ######CCC## or not.

 

I tweaked my syntax with the tail end of your suggested syntax as follows:

 

data test;
  input have_str :$20.;
  pos = prxmatch('/(JAN|FEB|MAR|APR|MAY|JUN|JUL|AUG|SEP|OCT|NOV|DEC)(\d\d)\b/o',have_str);
datalines;071030JUN17
072230JUN17
07AUG17
71030JUN17
7JUN17
071030JUNE17
071030JUN2017
103007JUN17
070020JUN17
002007JUN17;

 I'm using the new variable to act as a flag for the different types of errors (since POS = 0, 2, 3, 5, and 7) in the data.

 

Thanks for your help!

ballardw
Super User

Sounds to me that either a data entry program is poorly written that allows garbage for entry ( this data has multiple fields day, month year and time but apparentlyis entered as one) or that some user training involved lashes with wet spaghetti noodles is in the offing.

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 1041 views
  • 2 likes
  • 5 in conversation