Hello,
I need some suggestions on how I can extract a date in nn/nn/yyyy format. I use 'nn' rather than mm/dd because the data is entered in such a way where sometimes its mm/dd/yyyy or dd/mm/yyyy etc... but date always presents in some format using back slashes. I think prx may be the only was to do this, but I'm not familiar with it at all.
For example:
'HAS COMPLETED THE TRIAL. DATE OF LAST SCAN SHOWING STABLE DISEASE- 30/10/2020'
or
TRIAL COMPLETED. LAST CT SCAN SHOWING STABLE DISEASE WAS 05/NOV/2020
or
COMPLETION OF 2 YEARS TREATMENT- CT 21/04/21 SHOWS STABLE DISEASE
I found this code that searches for dates embeded in text, and it worked great, but only output '1' ...for yes a date exisits. I feel like it could be modified to output an actual date to a new variablee, but not sure where to start there.
Any help is greatly appreciated!!
date_search =
prxparse("m*january|february|march|april|june|july|august|september|october
|november|december|jan\.|feb\.|mar\.|apr\.|may\.|jun\.|jul\.|aug\.|sep\.|oc
t\.|nov\.|dec\.|, 19|, 20|\d-
|\djan|\dfeb|\dmar|\dapr|\dmay|\djun|\djul|\daug|\dsep|\doct|\dnov|\ddec|ja
n,|feb,|mar,|apr,|may,|jun,|jul,|aug,|sep,|oct,|nov,|dec,|jan \d|feb \d|mar
\d|apr \d|may \d|jun \d|jul \d|aug \d|sep \d|oct \d|nov \d|dec
\d|jan\d|feb\d|mar\d|apr\d|may\d|jun\d|jul\d|aug\d|sep\d|oct\d|nov\d|dec\d|
\d/|\d\d \d\d \d\d|\d:\d|\d\.\d\d\.\d|\d\.\d\.\d|jan-|feb-|mar-|apr-|may-
|jun-|jul-|aug-|sep-|oct-|nov-|dec-*oi");
if prxmatch(date_search,EOTDTHR )>0 then output;
This is a non-prx way to get to some of this. Doesn't handle all cases as shown but easy enough to add conditions.
data have;
orig='HAS COMPLETED THE TRIAL. DATE OF LAST SCAN SHOWING STABLE DISEASE- 30/10/2020';
output;
orig='TRIAL COMPLETED. LAST CT SCAN SHOWING STABLE DISEASE WAS 05/NOV/2020';
output;
orig='COMPLETION OF 2 YEARS TREATMENT- CT 21/04/21 SHOWS STABLE DISEASE';
output;
run;
data want;
set have;
length found_date $20.;
retain date found_date;
date=.;
found_date='';
format date date9.;
do i=1 to countw(orig);
word=scan(orig, i, " ");
if find(word, "/")>0 then
do;
date=input(word, anydtdte.);
found_date=word;
end;
end;
drop i word;
run;
@jenim514 wrote:
Hello,
I need some suggestions on how I can extract a date in nn/nn/yyyy format. I use 'nn' rather than mm/dd because the data is entered in such a way where sometimes its mm/dd/yyyy or dd/mm/yyyy etc... but date always presents in some format using back slashes. I think prx may be the only was to do this, but I'm not familiar with it at all.
For example:
'HAS COMPLETED THE TRIAL. DATE OF LAST SCAN SHOWING STABLE DISEASE- 30/10/2020'
or
TRIAL COMPLETED. LAST CT SCAN SHOWING STABLE DISEASE WAS 05/NOV/2020
or
COMPLETION OF 2 YEARS TREATMENT- CT 21/04/21 SHOWS STABLE DISEASE
I found this code that searches for dates embeded in text, and it worked great, but only output '1' ...for yes a date exisits. I feel like it could be modified to output an actual date to a new variablee, but not sure where to start there.
Any help is greatly appreciated!!
date_search = prxparse("m*january|february|march|april|june|july|august|september|october |november|december|jan\.|feb\.|mar\.|apr\.|may\.|jun\.|jul\.|aug\.|sep\.|oc t\.|nov\.|dec\.|, 19|, 20|\d- |\djan|\dfeb|\dmar|\dapr|\dmay|\djun|\djul|\daug|\dsep|\doct|\dnov|\ddec|ja n,|feb,|mar,|apr,|may,|jun,|jul,|aug,|sep,|oct,|nov,|dec,|jan \d|feb \d|mar \d|apr \d|may \d|jun \d|jul \d|aug \d|sep \d|oct \d|nov \d|dec \d|jan\d|feb\d|mar\d|apr\d|may\d|jun\d|jul\d|aug\d|sep\d|oct\d|nov\d|dec\d| \d/|\d\d \d\d \d\d|\d:\d|\d\.\d\d\.\d|\d\.\d\.\d|jan-|feb-|mar-|apr-|may- |jun-|jul-|aug-|sep-|oct-|nov-|dec-*oi"); if prxmatch(date_search,EOTDTHR )>0 then output;
This is a non-prx way to get to some of this. Doesn't handle all cases as shown but easy enough to add conditions.
data have;
orig='HAS COMPLETED THE TRIAL. DATE OF LAST SCAN SHOWING STABLE DISEASE- 30/10/2020';
output;
orig='TRIAL COMPLETED. LAST CT SCAN SHOWING STABLE DISEASE WAS 05/NOV/2020';
output;
orig='COMPLETION OF 2 YEARS TREATMENT- CT 21/04/21 SHOWS STABLE DISEASE';
output;
run;
data want;
set have;
length found_date $20.;
retain date found_date;
date=.;
found_date='';
format date date9.;
do i=1 to countw(orig);
word=scan(orig, i, " ");
if find(word, "/")>0 then
do;
date=input(word, anydtdte.);
found_date=word;
end;
end;
drop i word;
run;
@jenim514 wrote:
Hello,
I need some suggestions on how I can extract a date in nn/nn/yyyy format. I use 'nn' rather than mm/dd because the data is entered in such a way where sometimes its mm/dd/yyyy or dd/mm/yyyy etc... but date always presents in some format using back slashes. I think prx may be the only was to do this, but I'm not familiar with it at all.
For example:
'HAS COMPLETED THE TRIAL. DATE OF LAST SCAN SHOWING STABLE DISEASE- 30/10/2020'
or
TRIAL COMPLETED. LAST CT SCAN SHOWING STABLE DISEASE WAS 05/NOV/2020
or
COMPLETION OF 2 YEARS TREATMENT- CT 21/04/21 SHOWS STABLE DISEASE
I found this code that searches for dates embeded in text, and it worked great, but only output '1' ...for yes a date exisits. I feel like it could be modified to output an actual date to a new variablee, but not sure where to start there.
Any help is greatly appreciated!!
date_search = prxparse("m*january|february|march|april|june|july|august|september|october |november|december|jan\.|feb\.|mar\.|apr\.|may\.|jun\.|jul\.|aug\.|sep\.|oc t\.|nov\.|dec\.|, 19|, 20|\d- |\djan|\dfeb|\dmar|\dapr|\dmay|\djun|\djul|\daug|\dsep|\doct|\dnov|\ddec|ja n,|feb,|mar,|apr,|may,|jun,|jul,|aug,|sep,|oct,|nov,|dec,|jan \d|feb \d|mar \d|apr \d|may \d|jun \d|jul \d|aug \d|sep \d|oct \d|nov \d|dec \d|jan\d|feb\d|mar\d|apr\d|may\d|jun\d|jul\d|aug\d|sep\d|oct\d|nov\d|dec\d| \d/|\d\d \d\d \d\d|\d:\d|\d\.\d\d\.\d|\d\.\d\.\d|jan-|feb-|mar-|apr-|may- |jun-|jul-|aug-|sep-|oct-|nov-|dec-*oi"); if prxmatch(date_search,EOTDTHR )>0 then output;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.