Dear Experts,
I have to get the Cancer_onset_Date(Date column) using the (Age) month day year specified in the text column.
I shared the sample dataset for your reference.
data aaa;
Input ID DOB mmddyy10. Comments$90.;
format DOB mmddyy10.;
cards;
183556 03/13/1965 Cancer since 6 Months detected at the age of 53 Years 5 Month 13 Days in the yr 2018
452890 08/02/1950 Cancer since 25 years. Detected at the age of 43 years in the year 1994.
run;All I need is to find the date of the IDs which fall under that age month days year.
If the month/date is not specified, Replace the month/date as same as from the IDs DOB.
Thanks in advance!
Here is an attempt to solve your problem:
data aaa2;
set aaa;
format Cancer_onset_Date mmddyy10.;
/* Retrieve year of detection*/
if prxmatch('/^(.*)(\d{4})(.*)$/i',Comments) then
comments_year_detect = prxchange('s/(.*)(\d{4})(.*)/$2/',1,Comments);
/* Retrieve age of detection year, month and day*/
/*NB: the following research is case insensitive*/
/*Assumption 1: the Year is preceded by 'detected at the age of'*/
/*Some spelling variants have been considered: e.g. Years|Yrs|Year|Yr*/
if prxmatch('/^(.*)(detected at the age of)\s+(\d|\d{2})(\sYear)(.*)$/i',Comments) then
Cancer_onset_Date_y = prxchange('s/^(.*)(detected at the age of)\s+(\d|\d{2})(\sYear)(.*)$/$3/i',1,Comments);
if prxmatch('/^(.*)(Years|Yrs|Year|Yr)\s+(\d|\d{2})\s+(Month)(.*)$/i',Comments) then
Cancer_onset_Date_m = prxchange('s/^(.*)(Years|Yrs|Year|Yr)\s+(\d|\d{2})\s+(Month)(.*)$/$3/i',1,Comments);
if prxmatch('/^(.*)(Months|Mths|Month|Mth)\s+(\d|\d{2})\s+(Day)(.*)$/i',Comments) then
Cancer_onset_Date_d = prxchange('s/^(.*)(Months|Mths|Month|Mth)\s+(\d|\d{2})\s+(Day)(.*)$/$3/i',1,Comments);
/* Compute Cancer_onset_Date. If the month/date is not specified, replace the month/date as same as from the IDs DOB */
Cancer_onset_Date = intnx('year',DOB,Cancer_onset_Date_y,"s");
if Cancer_onset_Date_m ne "" then Cancer_onset_Date = intnx('month',Cancer_onset_Date,Cancer_onset_Date_m,"s");
if Cancer_onset_Date_d ne "" then Cancer_onset_Date = intnx('day',Cancer_onset_Date,Cancer_onset_Date_d,"s");
run;
The COMMENT is split into parts for Year, Month and Days. Here is one way with explicit extraction.
data want;
set aaa;
length str $90 strleft $30;
str = substr(Comments, find(Comments, 'age'));
/* Find Age */
k = find(upcase(str), 'YEARS');
strleft = substr(str,1, k-1);
str = substr(str, k+5);
Age = compress(strleft,' ','kd');
/* find Month */
k = find(upcase(str), 'MONTH');
if k then do;
strleft = substr(str,1, k-1);
Month = compress(strleft,' ','kd');
str = substr(str, k+6);
end;
else Month = put(month(DOB),8.);
/* find Day */
k = find(upcase(str), 'DAYS');
if k then do;
strleft = substr(str,1, k-1);
Day = compress(strleft,' ','kd');
end;
else Day = put(day(DOB),8.);
*put Age = Month = Day =;
keep ID DOB Age Month Day;
run;
If this does not meet your need, please clarify.
Here is an attempt to solve your problem:
data aaa2;
set aaa;
format Cancer_onset_Date mmddyy10.;
/* Retrieve year of detection*/
if prxmatch('/^(.*)(\d{4})(.*)$/i',Comments) then
comments_year_detect = prxchange('s/(.*)(\d{4})(.*)/$2/',1,Comments);
/* Retrieve age of detection year, month and day*/
/*NB: the following research is case insensitive*/
/*Assumption 1: the Year is preceded by 'detected at the age of'*/
/*Some spelling variants have been considered: e.g. Years|Yrs|Year|Yr*/
if prxmatch('/^(.*)(detected at the age of)\s+(\d|\d{2})(\sYear)(.*)$/i',Comments) then
Cancer_onset_Date_y = prxchange('s/^(.*)(detected at the age of)\s+(\d|\d{2})(\sYear)(.*)$/$3/i',1,Comments);
if prxmatch('/^(.*)(Years|Yrs|Year|Yr)\s+(\d|\d{2})\s+(Month)(.*)$/i',Comments) then
Cancer_onset_Date_m = prxchange('s/^(.*)(Years|Yrs|Year|Yr)\s+(\d|\d{2})\s+(Month)(.*)$/$3/i',1,Comments);
if prxmatch('/^(.*)(Months|Mths|Month|Mth)\s+(\d|\d{2})\s+(Day)(.*)$/i',Comments) then
Cancer_onset_Date_d = prxchange('s/^(.*)(Months|Mths|Month|Mth)\s+(\d|\d{2})\s+(Day)(.*)$/$3/i',1,Comments);
/* Compute Cancer_onset_Date. If the month/date is not specified, replace the month/date as same as from the IDs DOB */
Cancer_onset_Date = intnx('year',DOB,Cancer_onset_Date_y,"s");
if Cancer_onset_Date_m ne "" then Cancer_onset_Date = intnx('month',Cancer_onset_Date,Cancer_onset_Date_m,"s");
if Cancer_onset_Date_d ne "" then Cancer_onset_Date = intnx('day',Cancer_onset_Date,Cancer_onset_Date_d,"s");
run;
With the data you have posted, the following steps seems to extract the data you need:
data want;
set aaa;
length
rx_year rx_month_day 8
year month day 8
;
rx_year = prxparse('/(year|yr) (\d+)/i');
rx_month_day = prxparse('/years (\d+) month (\d+) days/i');
comments = compbl(comments);
if prxmatch(rx_year, comments) then do;
year = input(prxposn(rx_year, 2, comments), ? 4.);
end;
if prxmatch(rx_month_day, comments) then do;
month = input(prxposn(rx_month_day, 1, comments), ? 2.);
day = input(prxposn(rx_month_day, 2, comments), ? 2.);
end;
else do;
month = month(DOB);
day = day(dob);
end;
drop rx_:;
run;
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.