BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
jenim514
Pyrite | Level 9

Hello,

 

I need some suggestions on how I can extract a date in  nn/nn/yyyy format.  I use 'nn' rather than mm/dd because the data is entered in such a way where sometimes its mm/dd/yyyy or dd/mm/yyyy etc...  but date always presents in some format using back slashes.  I think prx may be the only was to do this, but I'm not familiar with it at all.

 

For example:

'HAS COMPLETED THE TRIAL. DATE OF LAST SCAN SHOWING STABLE DISEASE- 30/10/2020'

or

TRIAL COMPLETED. LAST CT SCAN SHOWING STABLE DISEASE WAS 05/NOV/2020

or

COMPLETION OF 2 YEARS TREATMENT- CT 21/04/21 SHOWS STABLE DISEASE

 

I found this code that searches for dates embeded in text, and it worked great, but only output '1' ...for yes a date exisits.  I feel like it could be modified to output an actual date to a new variablee, but not sure where to start there.

 

Any help is greatly appreciated!!

 date_search =
prxparse("m*january|february|march|april|june|july|august|september|october
|november|december|jan\.|feb\.|mar\.|apr\.|may\.|jun\.|jul\.|aug\.|sep\.|oc
t\.|nov\.|dec\.|, 19|, 20|\d-
|\djan|\dfeb|\dmar|\dapr|\dmay|\djun|\djul|\daug|\dsep|\doct|\dnov|\ddec|ja
n,|feb,|mar,|apr,|may,|jun,|jul,|aug,|sep,|oct,|nov,|dec,|jan \d|feb \d|mar
\d|apr \d|may \d|jun \d|jul \d|aug \d|sep \d|oct \d|nov \d|dec 


\d|jan\d|feb\d|mar\d|apr\d|may\d|jun\d|jul\d|aug\d|sep\d|oct\d|nov\d|dec\d|
\d/|\d\d \d\d \d\d|\d:\d|\d\.\d\d\.\d|\d\.\d\.\d|jan-|feb-|mar-|apr-|may-
|jun-|jul-|aug-|sep-|oct-|nov-|dec-*oi");
 if prxmatch(date_search,EOTDTHR )>0 then output;

 

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User

This is a non-prx way to get to some of this. Doesn't handle all cases as shown but easy enough to add conditions. 

 

data have;
	orig='HAS COMPLETED THE TRIAL. DATE OF LAST SCAN SHOWING STABLE DISEASE- 30/10/2020';
	output;
	orig='TRIAL COMPLETED. LAST CT SCAN SHOWING STABLE DISEASE WAS 05/NOV/2020';
	output;
	orig='COMPLETION OF 2 YEARS TREATMENT- CT 21/04/21 SHOWS STABLE DISEASE';
	output;
run;

data want;
	set have;
	length found_date $20.;
	retain date found_date;
	date=.;
	found_date='';
	format date date9.;

	do i=1 to countw(orig);
		word=scan(orig, i, " ");

		if find(word, "/")>0 then
			do;
				date=input(word, anydtdte.);
				found_date=word;
			end;
	end;
	drop i word;
run;

@jenim514 wrote:

Hello,

 

I need some suggestions on how I can extract a date in  nn/nn/yyyy format.  I use 'nn' rather than mm/dd because the data is entered in such a way where sometimes its mm/dd/yyyy or dd/mm/yyyy etc...  but date always presents in some format using back slashes.  I think prx may be the only was to do this, but I'm not familiar with it at all.

 

For example:

'HAS COMPLETED THE TRIAL. DATE OF LAST SCAN SHOWING STABLE DISEASE- 30/10/2020'

or

TRIAL COMPLETED. LAST CT SCAN SHOWING STABLE DISEASE WAS 05/NOV/2020

or

COMPLETION OF 2 YEARS TREATMENT- CT 21/04/21 SHOWS STABLE DISEASE

 

I found this code that searches for dates embeded in text, and it worked great, but only output '1' ...for yes a date exisits.  I feel like it could be modified to output an actual date to a new variablee, but not sure where to start there.

 

Any help is greatly appreciated!!

 date_search =
prxparse("m*january|february|march|april|june|july|august|september|october
|november|december|jan\.|feb\.|mar\.|apr\.|may\.|jun\.|jul\.|aug\.|sep\.|oc
t\.|nov\.|dec\.|, 19|, 20|\d-
|\djan|\dfeb|\dmar|\dapr|\dmay|\djun|\djul|\daug|\dsep|\doct|\dnov|\ddec|ja
n,|feb,|mar,|apr,|may,|jun,|jul,|aug,|sep,|oct,|nov,|dec,|jan \d|feb \d|mar
\d|apr \d|may \d|jun \d|jul \d|aug \d|sep \d|oct \d|nov \d|dec 


\d|jan\d|feb\d|mar\d|apr\d|may\d|jun\d|jul\d|aug\d|sep\d|oct\d|nov\d|dec\d|
\d/|\d\d \d\d \d\d|\d:\d|\d\.\d\d\.\d|\d\.\d\.\d|jan-|feb-|mar-|apr-|may-
|jun-|jul-|aug-|sep-|oct-|nov-|dec-*oi");
 if prxmatch(date_search,EOTDTHR )>0 then output;

 


 

View solution in original post

1 REPLY 1
Reeza
Super User

This is a non-prx way to get to some of this. Doesn't handle all cases as shown but easy enough to add conditions. 

 

data have;
	orig='HAS COMPLETED THE TRIAL. DATE OF LAST SCAN SHOWING STABLE DISEASE- 30/10/2020';
	output;
	orig='TRIAL COMPLETED. LAST CT SCAN SHOWING STABLE DISEASE WAS 05/NOV/2020';
	output;
	orig='COMPLETION OF 2 YEARS TREATMENT- CT 21/04/21 SHOWS STABLE DISEASE';
	output;
run;

data want;
	set have;
	length found_date $20.;
	retain date found_date;
	date=.;
	found_date='';
	format date date9.;

	do i=1 to countw(orig);
		word=scan(orig, i, " ");

		if find(word, "/")>0 then
			do;
				date=input(word, anydtdte.);
				found_date=word;
			end;
	end;
	drop i word;
run;

@jenim514 wrote:

Hello,

 

I need some suggestions on how I can extract a date in  nn/nn/yyyy format.  I use 'nn' rather than mm/dd because the data is entered in such a way where sometimes its mm/dd/yyyy or dd/mm/yyyy etc...  but date always presents in some format using back slashes.  I think prx may be the only was to do this, but I'm not familiar with it at all.

 

For example:

'HAS COMPLETED THE TRIAL. DATE OF LAST SCAN SHOWING STABLE DISEASE- 30/10/2020'

or

TRIAL COMPLETED. LAST CT SCAN SHOWING STABLE DISEASE WAS 05/NOV/2020

or

COMPLETION OF 2 YEARS TREATMENT- CT 21/04/21 SHOWS STABLE DISEASE

 

I found this code that searches for dates embeded in text, and it worked great, but only output '1' ...for yes a date exisits.  I feel like it could be modified to output an actual date to a new variablee, but not sure where to start there.

 

Any help is greatly appreciated!!

 date_search =
prxparse("m*january|february|march|april|june|july|august|september|october
|november|december|jan\.|feb\.|mar\.|apr\.|may\.|jun\.|jul\.|aug\.|sep\.|oc
t\.|nov\.|dec\.|, 19|, 20|\d-
|\djan|\dfeb|\dmar|\dapr|\dmay|\djun|\djul|\daug|\dsep|\doct|\dnov|\ddec|ja
n,|feb,|mar,|apr,|may,|jun,|jul,|aug,|sep,|oct,|nov,|dec,|jan \d|feb \d|mar
\d|apr \d|may \d|jun \d|jul \d|aug \d|sep \d|oct \d|nov \d|dec 


\d|jan\d|feb\d|mar\d|apr\d|may\d|jun\d|jul\d|aug\d|sep\d|oct\d|nov\d|dec\d|
\d/|\d\d \d\d \d\d|\d:\d|\d\.\d\d\.\d|\d\.\d\.\d|jan-|feb-|mar-|apr-|may-
|jun-|jul-|aug-|sep-|oct-|nov-|dec-*oi");
 if prxmatch(date_search,EOTDTHR )>0 then output;

 


 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 1 reply
  • 669 views
  • 0 likes
  • 2 in conversation