BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
jenim514
Pyrite | Level 9

Hello,

 

I need some suggestions on how I can extract a date in  nn/nn/yyyy format.  I use 'nn' rather than mm/dd because the data is entered in such a way where sometimes its mm/dd/yyyy or dd/mm/yyyy etc...  but date always presents in some format using back slashes.  I think prx may be the only was to do this, but I'm not familiar with it at all.

 

For example:

'HAS COMPLETED THE TRIAL. DATE OF LAST SCAN SHOWING STABLE DISEASE- 30/10/2020'

or

TRIAL COMPLETED. LAST CT SCAN SHOWING STABLE DISEASE WAS 05/NOV/2020

or

COMPLETION OF 2 YEARS TREATMENT- CT 21/04/21 SHOWS STABLE DISEASE

 

I found this code that searches for dates embeded in text, and it worked great, but only output '1' ...for yes a date exisits.  I feel like it could be modified to output an actual date to a new variablee, but not sure where to start there.

 

Any help is greatly appreciated!!

 date_search =
prxparse("m*january|february|march|april|june|july|august|september|october
|november|december|jan\.|feb\.|mar\.|apr\.|may\.|jun\.|jul\.|aug\.|sep\.|oc
t\.|nov\.|dec\.|, 19|, 20|\d-
|\djan|\dfeb|\dmar|\dapr|\dmay|\djun|\djul|\daug|\dsep|\doct|\dnov|\ddec|ja
n,|feb,|mar,|apr,|may,|jun,|jul,|aug,|sep,|oct,|nov,|dec,|jan \d|feb \d|mar
\d|apr \d|may \d|jun \d|jul \d|aug \d|sep \d|oct \d|nov \d|dec 


\d|jan\d|feb\d|mar\d|apr\d|may\d|jun\d|jul\d|aug\d|sep\d|oct\d|nov\d|dec\d|
\d/|\d\d \d\d \d\d|\d:\d|\d\.\d\d\.\d|\d\.\d\.\d|jan-|feb-|mar-|apr-|may-
|jun-|jul-|aug-|sep-|oct-|nov-|dec-*oi");
 if prxmatch(date_search,EOTDTHR )>0 then output;

 

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User

This is a non-prx way to get to some of this. Doesn't handle all cases as shown but easy enough to add conditions. 

 

data have;
	orig='HAS COMPLETED THE TRIAL. DATE OF LAST SCAN SHOWING STABLE DISEASE- 30/10/2020';
	output;
	orig='TRIAL COMPLETED. LAST CT SCAN SHOWING STABLE DISEASE WAS 05/NOV/2020';
	output;
	orig='COMPLETION OF 2 YEARS TREATMENT- CT 21/04/21 SHOWS STABLE DISEASE';
	output;
run;

data want;
	set have;
	length found_date $20.;
	retain date found_date;
	date=.;
	found_date='';
	format date date9.;

	do i=1 to countw(orig);
		word=scan(orig, i, " ");

		if find(word, "/")>0 then
			do;
				date=input(word, anydtdte.);
				found_date=word;
			end;
	end;
	drop i word;
run;

@jenim514 wrote:

Hello,

 

I need some suggestions on how I can extract a date in  nn/nn/yyyy format.  I use 'nn' rather than mm/dd because the data is entered in such a way where sometimes its mm/dd/yyyy or dd/mm/yyyy etc...  but date always presents in some format using back slashes.  I think prx may be the only was to do this, but I'm not familiar with it at all.

 

For example:

'HAS COMPLETED THE TRIAL. DATE OF LAST SCAN SHOWING STABLE DISEASE- 30/10/2020'

or

TRIAL COMPLETED. LAST CT SCAN SHOWING STABLE DISEASE WAS 05/NOV/2020

or

COMPLETION OF 2 YEARS TREATMENT- CT 21/04/21 SHOWS STABLE DISEASE

 

I found this code that searches for dates embeded in text, and it worked great, but only output '1' ...for yes a date exisits.  I feel like it could be modified to output an actual date to a new variablee, but not sure where to start there.

 

Any help is greatly appreciated!!

 date_search =
prxparse("m*january|february|march|april|june|july|august|september|october
|november|december|jan\.|feb\.|mar\.|apr\.|may\.|jun\.|jul\.|aug\.|sep\.|oc
t\.|nov\.|dec\.|, 19|, 20|\d-
|\djan|\dfeb|\dmar|\dapr|\dmay|\djun|\djul|\daug|\dsep|\doct|\dnov|\ddec|ja
n,|feb,|mar,|apr,|may,|jun,|jul,|aug,|sep,|oct,|nov,|dec,|jan \d|feb \d|mar
\d|apr \d|may \d|jun \d|jul \d|aug \d|sep \d|oct \d|nov \d|dec 


\d|jan\d|feb\d|mar\d|apr\d|may\d|jun\d|jul\d|aug\d|sep\d|oct\d|nov\d|dec\d|
\d/|\d\d \d\d \d\d|\d:\d|\d\.\d\d\.\d|\d\.\d\.\d|jan-|feb-|mar-|apr-|may-
|jun-|jul-|aug-|sep-|oct-|nov-|dec-*oi");
 if prxmatch(date_search,EOTDTHR )>0 then output;

 


 

View solution in original post

1 REPLY 1
Reeza
Super User

This is a non-prx way to get to some of this. Doesn't handle all cases as shown but easy enough to add conditions. 

 

data have;
	orig='HAS COMPLETED THE TRIAL. DATE OF LAST SCAN SHOWING STABLE DISEASE- 30/10/2020';
	output;
	orig='TRIAL COMPLETED. LAST CT SCAN SHOWING STABLE DISEASE WAS 05/NOV/2020';
	output;
	orig='COMPLETION OF 2 YEARS TREATMENT- CT 21/04/21 SHOWS STABLE DISEASE';
	output;
run;

data want;
	set have;
	length found_date $20.;
	retain date found_date;
	date=.;
	found_date='';
	format date date9.;

	do i=1 to countw(orig);
		word=scan(orig, i, " ");

		if find(word, "/")>0 then
			do;
				date=input(word, anydtdte.);
				found_date=word;
			end;
	end;
	drop i word;
run;

@jenim514 wrote:

Hello,

 

I need some suggestions on how I can extract a date in  nn/nn/yyyy format.  I use 'nn' rather than mm/dd because the data is entered in such a way where sometimes its mm/dd/yyyy or dd/mm/yyyy etc...  but date always presents in some format using back slashes.  I think prx may be the only was to do this, but I'm not familiar with it at all.

 

For example:

'HAS COMPLETED THE TRIAL. DATE OF LAST SCAN SHOWING STABLE DISEASE- 30/10/2020'

or

TRIAL COMPLETED. LAST CT SCAN SHOWING STABLE DISEASE WAS 05/NOV/2020

or

COMPLETION OF 2 YEARS TREATMENT- CT 21/04/21 SHOWS STABLE DISEASE

 

I found this code that searches for dates embeded in text, and it worked great, but only output '1' ...for yes a date exisits.  I feel like it could be modified to output an actual date to a new variablee, but not sure where to start there.

 

Any help is greatly appreciated!!

 date_search =
prxparse("m*january|february|march|april|june|july|august|september|october
|november|december|jan\.|feb\.|mar\.|apr\.|may\.|jun\.|jul\.|aug\.|sep\.|oc
t\.|nov\.|dec\.|, 19|, 20|\d-
|\djan|\dfeb|\dmar|\dapr|\dmay|\djun|\djul|\daug|\dsep|\doct|\dnov|\ddec|ja
n,|feb,|mar,|apr,|may,|jun,|jul,|aug,|sep,|oct,|nov,|dec,|jan \d|feb \d|mar
\d|apr \d|may \d|jun \d|jul \d|aug \d|sep \d|oct \d|nov \d|dec 


\d|jan\d|feb\d|mar\d|apr\d|may\d|jun\d|jul\d|aug\d|sep\d|oct\d|nov\d|dec\d|
\d/|\d\d \d\d \d\d|\d:\d|\d\.\d\d\.\d|\d\.\d\.\d|jan-|feb-|mar-|apr-|may-
|jun-|jul-|aug-|sep-|oct-|nov-|dec-*oi");
 if prxmatch(date_search,EOTDTHR )>0 then output;

 


 

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 1 reply
  • 1483 views
  • 0 likes
  • 2 in conversation