Dear Experts,
I have to get the Cancer_onset_Date(Date column) using the (Age) month day year specified in the text column.
I shared the sample dataset for your reference.
data aaa;
Input ID DOB mmddyy10. Comments$90.;
format DOB mmddyy10.;
cards;
183556 03/13/1965 Cancer since 6 Months detected at the age of 53 Years 5 Month 13 Days in the yr 2018
452890 08/02/1950 Cancer since 25 years. Detected at the age of 43 years in the year 1994.
run;
All I need is to find the date of the IDs which fall under that age month days year.
If the month/date is not specified, Replace the month/date as same as from the IDs DOB.
Thanks in advance!
Here is an attempt to solve your problem:
data aaa2;
set aaa;
format Cancer_onset_Date mmddyy10.;
/* Retrieve year of detection*/
if prxmatch('/^(.*)(\d{4})(.*)$/i',Comments) then
comments_year_detect = prxchange('s/(.*)(\d{4})(.*)/$2/',1,Comments);
/* Retrieve age of detection year, month and day*/
/*NB: the following research is case insensitive*/
/*Assumption 1: the Year is preceded by 'detected at the age of'*/
/*Some spelling variants have been considered: e.g. Years|Yrs|Year|Yr*/
if prxmatch('/^(.*)(detected at the age of)\s+(\d|\d{2})(\sYear)(.*)$/i',Comments) then
Cancer_onset_Date_y = prxchange('s/^(.*)(detected at the age of)\s+(\d|\d{2})(\sYear)(.*)$/$3/i',1,Comments);
if prxmatch('/^(.*)(Years|Yrs|Year|Yr)\s+(\d|\d{2})\s+(Month)(.*)$/i',Comments) then
Cancer_onset_Date_m = prxchange('s/^(.*)(Years|Yrs|Year|Yr)\s+(\d|\d{2})\s+(Month)(.*)$/$3/i',1,Comments);
if prxmatch('/^(.*)(Months|Mths|Month|Mth)\s+(\d|\d{2})\s+(Day)(.*)$/i',Comments) then
Cancer_onset_Date_d = prxchange('s/^(.*)(Months|Mths|Month|Mth)\s+(\d|\d{2})\s+(Day)(.*)$/$3/i',1,Comments);
/* Compute Cancer_onset_Date. If the month/date is not specified, replace the month/date as same as from the IDs DOB */
Cancer_onset_Date = intnx('year',DOB,Cancer_onset_Date_y,"s");
if Cancer_onset_Date_m ne "" then Cancer_onset_Date = intnx('month',Cancer_onset_Date,Cancer_onset_Date_m,"s");
if Cancer_onset_Date_d ne "" then Cancer_onset_Date = intnx('day',Cancer_onset_Date,Cancer_onset_Date_d,"s");
run;
The COMMENT is split into parts for Year, Month and Days. Here is one way with explicit extraction.
data want;
set aaa;
length str $90 strleft $30;
str = substr(Comments, find(Comments, 'age'));
/* Find Age */
k = find(upcase(str), 'YEARS');
strleft = substr(str,1, k-1);
str = substr(str, k+5);
Age = compress(strleft,' ','kd');
/* find Month */
k = find(upcase(str), 'MONTH');
if k then do;
strleft = substr(str,1, k-1);
Month = compress(strleft,' ','kd');
str = substr(str, k+6);
end;
else Month = put(month(DOB),8.);
/* find Day */
k = find(upcase(str), 'DAYS');
if k then do;
strleft = substr(str,1, k-1);
Day = compress(strleft,' ','kd');
end;
else Day = put(day(DOB),8.);
*put Age = Month = Day =;
keep ID DOB Age Month Day;
run;
If this does not meet your need, please clarify.
Here is an attempt to solve your problem:
data aaa2;
set aaa;
format Cancer_onset_Date mmddyy10.;
/* Retrieve year of detection*/
if prxmatch('/^(.*)(\d{4})(.*)$/i',Comments) then
comments_year_detect = prxchange('s/(.*)(\d{4})(.*)/$2/',1,Comments);
/* Retrieve age of detection year, month and day*/
/*NB: the following research is case insensitive*/
/*Assumption 1: the Year is preceded by 'detected at the age of'*/
/*Some spelling variants have been considered: e.g. Years|Yrs|Year|Yr*/
if prxmatch('/^(.*)(detected at the age of)\s+(\d|\d{2})(\sYear)(.*)$/i',Comments) then
Cancer_onset_Date_y = prxchange('s/^(.*)(detected at the age of)\s+(\d|\d{2})(\sYear)(.*)$/$3/i',1,Comments);
if prxmatch('/^(.*)(Years|Yrs|Year|Yr)\s+(\d|\d{2})\s+(Month)(.*)$/i',Comments) then
Cancer_onset_Date_m = prxchange('s/^(.*)(Years|Yrs|Year|Yr)\s+(\d|\d{2})\s+(Month)(.*)$/$3/i',1,Comments);
if prxmatch('/^(.*)(Months|Mths|Month|Mth)\s+(\d|\d{2})\s+(Day)(.*)$/i',Comments) then
Cancer_onset_Date_d = prxchange('s/^(.*)(Months|Mths|Month|Mth)\s+(\d|\d{2})\s+(Day)(.*)$/$3/i',1,Comments);
/* Compute Cancer_onset_Date. If the month/date is not specified, replace the month/date as same as from the IDs DOB */
Cancer_onset_Date = intnx('year',DOB,Cancer_onset_Date_y,"s");
if Cancer_onset_Date_m ne "" then Cancer_onset_Date = intnx('month',Cancer_onset_Date,Cancer_onset_Date_m,"s");
if Cancer_onset_Date_d ne "" then Cancer_onset_Date = intnx('day',Cancer_onset_Date,Cancer_onset_Date_d,"s");
run;
With the data you have posted, the following steps seems to extract the data you need:
data want;
set aaa;
length
rx_year rx_month_day 8
year month day 8
;
rx_year = prxparse('/(year|yr) (\d+)/i');
rx_month_day = prxparse('/years (\d+) month (\d+) days/i');
comments = compbl(comments);
if prxmatch(rx_year, comments) then do;
year = input(prxposn(rx_year, 2, comments), ? 4.);
end;
if prxmatch(rx_month_day, comments) then do;
month = input(prxposn(rx_month_day, 1, comments), ? 2.);
day = input(prxposn(rx_month_day, 2, comments), ? 2.);
end;
else do;
month = month(DOB);
day = day(dob);
end;
drop rx_:;
run;
Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.
Register today!Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Select SAS Training centers are offering in-person courses. View upcoming courses for: