BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
jenim514
Pyrite | Level 9

Hello,

 

I need some suggestions on how I can extract a date in  nn/nn/yyyy format.  I use 'nn' rather than mm/dd because the data is entered in such a way where sometimes its mm/dd/yyyy or dd/mm/yyyy etc...  but date always presents in some format using back slashes.  I think prx may be the only was to do this, but I'm not familiar with it at all.

 

For example:

'HAS COMPLETED THE TRIAL. DATE OF LAST SCAN SHOWING STABLE DISEASE- 30/10/2020'

or

TRIAL COMPLETED. LAST CT SCAN SHOWING STABLE DISEASE WAS 05/NOV/2020

or

COMPLETION OF 2 YEARS TREATMENT- CT 21/04/21 SHOWS STABLE DISEASE

 

I found this code that searches for dates embeded in text, and it worked great, but only output '1' ...for yes a date exisits.  I feel like it could be modified to output an actual date to a new variablee, but not sure where to start there.

 

Any help is greatly appreciated!!

 date_search =
prxparse("m*january|february|march|april|june|july|august|september|october
|november|december|jan\.|feb\.|mar\.|apr\.|may\.|jun\.|jul\.|aug\.|sep\.|oc
t\.|nov\.|dec\.|, 19|, 20|\d-
|\djan|\dfeb|\dmar|\dapr|\dmay|\djun|\djul|\daug|\dsep|\doct|\dnov|\ddec|ja
n,|feb,|mar,|apr,|may,|jun,|jul,|aug,|sep,|oct,|nov,|dec,|jan \d|feb \d|mar
\d|apr \d|may \d|jun \d|jul \d|aug \d|sep \d|oct \d|nov \d|dec 


\d|jan\d|feb\d|mar\d|apr\d|may\d|jun\d|jul\d|aug\d|sep\d|oct\d|nov\d|dec\d|
\d/|\d\d \d\d \d\d|\d:\d|\d\.\d\d\.\d|\d\.\d\.\d|jan-|feb-|mar-|apr-|may-
|jun-|jul-|aug-|sep-|oct-|nov-|dec-*oi");
 if prxmatch(date_search,EOTDTHR )>0 then output;

 

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User

This is a non-prx way to get to some of this. Doesn't handle all cases as shown but easy enough to add conditions. 

 

data have;
	orig='HAS COMPLETED THE TRIAL. DATE OF LAST SCAN SHOWING STABLE DISEASE- 30/10/2020';
	output;
	orig='TRIAL COMPLETED. LAST CT SCAN SHOWING STABLE DISEASE WAS 05/NOV/2020';
	output;
	orig='COMPLETION OF 2 YEARS TREATMENT- CT 21/04/21 SHOWS STABLE DISEASE';
	output;
run;

data want;
	set have;
	length found_date $20.;
	retain date found_date;
	date=.;
	found_date='';
	format date date9.;

	do i=1 to countw(orig);
		word=scan(orig, i, " ");

		if find(word, "/")>0 then
			do;
				date=input(word, anydtdte.);
				found_date=word;
			end;
	end;
	drop i word;
run;

@jenim514 wrote:

Hello,

 

I need some suggestions on how I can extract a date in  nn/nn/yyyy format.  I use 'nn' rather than mm/dd because the data is entered in such a way where sometimes its mm/dd/yyyy or dd/mm/yyyy etc...  but date always presents in some format using back slashes.  I think prx may be the only was to do this, but I'm not familiar with it at all.

 

For example:

'HAS COMPLETED THE TRIAL. DATE OF LAST SCAN SHOWING STABLE DISEASE- 30/10/2020'

or

TRIAL COMPLETED. LAST CT SCAN SHOWING STABLE DISEASE WAS 05/NOV/2020

or

COMPLETION OF 2 YEARS TREATMENT- CT 21/04/21 SHOWS STABLE DISEASE

 

I found this code that searches for dates embeded in text, and it worked great, but only output '1' ...for yes a date exisits.  I feel like it could be modified to output an actual date to a new variablee, but not sure where to start there.

 

Any help is greatly appreciated!!

 date_search =
prxparse("m*january|february|march|april|june|july|august|september|october
|november|december|jan\.|feb\.|mar\.|apr\.|may\.|jun\.|jul\.|aug\.|sep\.|oc
t\.|nov\.|dec\.|, 19|, 20|\d-
|\djan|\dfeb|\dmar|\dapr|\dmay|\djun|\djul|\daug|\dsep|\doct|\dnov|\ddec|ja
n,|feb,|mar,|apr,|may,|jun,|jul,|aug,|sep,|oct,|nov,|dec,|jan \d|feb \d|mar
\d|apr \d|may \d|jun \d|jul \d|aug \d|sep \d|oct \d|nov \d|dec 


\d|jan\d|feb\d|mar\d|apr\d|may\d|jun\d|jul\d|aug\d|sep\d|oct\d|nov\d|dec\d|
\d/|\d\d \d\d \d\d|\d:\d|\d\.\d\d\.\d|\d\.\d\.\d|jan-|feb-|mar-|apr-|may-
|jun-|jul-|aug-|sep-|oct-|nov-|dec-*oi");
 if prxmatch(date_search,EOTDTHR )>0 then output;

 


 

View solution in original post

1 REPLY 1
Reeza
Super User

This is a non-prx way to get to some of this. Doesn't handle all cases as shown but easy enough to add conditions. 

 

data have;
	orig='HAS COMPLETED THE TRIAL. DATE OF LAST SCAN SHOWING STABLE DISEASE- 30/10/2020';
	output;
	orig='TRIAL COMPLETED. LAST CT SCAN SHOWING STABLE DISEASE WAS 05/NOV/2020';
	output;
	orig='COMPLETION OF 2 YEARS TREATMENT- CT 21/04/21 SHOWS STABLE DISEASE';
	output;
run;

data want;
	set have;
	length found_date $20.;
	retain date found_date;
	date=.;
	found_date='';
	format date date9.;

	do i=1 to countw(orig);
		word=scan(orig, i, " ");

		if find(word, "/")>0 then
			do;
				date=input(word, anydtdte.);
				found_date=word;
			end;
	end;
	drop i word;
run;

@jenim514 wrote:

Hello,

 

I need some suggestions on how I can extract a date in  nn/nn/yyyy format.  I use 'nn' rather than mm/dd because the data is entered in such a way where sometimes its mm/dd/yyyy or dd/mm/yyyy etc...  but date always presents in some format using back slashes.  I think prx may be the only was to do this, but I'm not familiar with it at all.

 

For example:

'HAS COMPLETED THE TRIAL. DATE OF LAST SCAN SHOWING STABLE DISEASE- 30/10/2020'

or

TRIAL COMPLETED. LAST CT SCAN SHOWING STABLE DISEASE WAS 05/NOV/2020

or

COMPLETION OF 2 YEARS TREATMENT- CT 21/04/21 SHOWS STABLE DISEASE

 

I found this code that searches for dates embeded in text, and it worked great, but only output '1' ...for yes a date exisits.  I feel like it could be modified to output an actual date to a new variablee, but not sure where to start there.

 

Any help is greatly appreciated!!

 date_search =
prxparse("m*january|february|march|april|june|july|august|september|october
|november|december|jan\.|feb\.|mar\.|apr\.|may\.|jun\.|jul\.|aug\.|sep\.|oc
t\.|nov\.|dec\.|, 19|, 20|\d-
|\djan|\dfeb|\dmar|\dapr|\dmay|\djun|\djul|\daug|\dsep|\doct|\dnov|\ddec|ja
n,|feb,|mar,|apr,|may,|jun,|jul,|aug,|sep,|oct,|nov,|dec,|jan \d|feb \d|mar
\d|apr \d|may \d|jun \d|jul \d|aug \d|sep \d|oct \d|nov \d|dec 


\d|jan\d|feb\d|mar\d|apr\d|may\d|jun\d|jul\d|aug\d|sep\d|oct\d|nov\d|dec\d|
\d/|\d\d \d\d \d\d|\d:\d|\d\.\d\d\.\d|\d\.\d\.\d|jan-|feb-|mar-|apr-|may-
|jun-|jul-|aug-|sep-|oct-|nov-|dec-*oi");
 if prxmatch(date_search,EOTDTHR )>0 then output;

 


 

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 1 reply
  • 1576 views
  • 0 likes
  • 2 in conversation