BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Sathish_jammy
Lapis Lazuli | Level 10

Dear Experts,

 

I have to get the Cancer_onset_Date(Date column) using the (Age) month day year specified in the text column.

I shared the sample dataset for your reference.

 

data aaa;
Input ID DOB mmddyy10. Comments$90.;
format DOB mmddyy10.;
cards;
183556 03/13/1965 Cancer since 6 Months detected at the age of  53 Years 5 Month 13 Days in the yr 2018
452890 08/02/1950 Cancer since 25 years. Detected at the age of 43 years in the year 1994.
run;

All I need is to find the date of the IDs which fall under that age month days year.

If the month/date is not specified, Replace the month/date as same as from the IDs DOB.

 

Thanks in advance!

 

1 ACCEPTED SOLUTION

Accepted Solutions
ed_sas_member
Meteorite | Level 14

Hi @Sathish_jammy 

 

Here is an attempt to solve your problem:

data aaa2;
	set aaa;
	format Cancer_onset_Date mmddyy10.;

	/* Retrieve year of detection*/	
	if prxmatch('/^(.*)(\d{4})(.*)$/i',Comments) then 
		comments_year_detect  = prxchange('s/(.*)(\d{4})(.*)/$2/',1,Comments);

	/* Retrieve age of detection year, month and day*/
		/*NB: the following research is case insensitive*/
		/*Assumption 1: the Year is preceded by 'detected at the age of'*/
		/*Some spelling variants have been considered: e.g. Years|Yrs|Year|Yr*/
		
	if prxmatch('/^(.*)(detected at the age of)\s+(\d|\d{2})(\sYear)(.*)$/i',Comments) then 
		Cancer_onset_Date_y = prxchange('s/^(.*)(detected at the age of)\s+(\d|\d{2})(\sYear)(.*)$/$3/i',1,Comments);
	if prxmatch('/^(.*)(Years|Yrs|Year|Yr)\s+(\d|\d{2})\s+(Month)(.*)$/i',Comments) then 
		Cancer_onset_Date_m = prxchange('s/^(.*)(Years|Yrs|Year|Yr)\s+(\d|\d{2})\s+(Month)(.*)$/$3/i',1,Comments);
	if prxmatch('/^(.*)(Months|Mths|Month|Mth)\s+(\d|\d{2})\s+(Day)(.*)$/i',Comments) then 
		Cancer_onset_Date_d = prxchange('s/^(.*)(Months|Mths|Month|Mth)\s+(\d|\d{2})\s+(Day)(.*)$/$3/i',1,Comments);

	/* Compute Cancer_onset_Date. If the month/date is not specified, replace the month/date as same as from the IDs DOB */
	
	Cancer_onset_Date = intnx('year',DOB,Cancer_onset_Date_y,"s");
	if Cancer_onset_Date_m ne "" then Cancer_onset_Date = intnx('month',Cancer_onset_Date,Cancer_onset_Date_m,"s");
	if Cancer_onset_Date_d ne "" then Cancer_onset_Date = intnx('day',Cancer_onset_Date,Cancer_onset_Date_d,"s");

run;

 Capture d’écran 2020-01-06 à 09.59.48.png

 

View solution in original post

3 REPLIES 3
KachiM
Rhodochrosite | Level 12

@Sathish_jammy ,

 

The COMMENT is split into parts for Year, Month and Days. Here is one way with explicit extraction.

 

data want;
   set aaa;
   length str $90 strleft $30;
   str = substr(Comments, find(Comments, 'age'));

   /* Find Age */
   k = find(upcase(str), 'YEARS');
   strleft = substr(str,1, k-1);
   str = substr(str, k+5);
   Age = compress(strleft,' ','kd');

   /* find Month */
   k = find(upcase(str), 'MONTH');
   if k then do;
      strleft = substr(str,1, k-1);
      Month = compress(strleft,' ','kd');
      str = substr(str, k+6);
   end;
   else Month = put(month(DOB),8.);

   /* find Day */
   k = find(upcase(str), 'DAYS');
   if k then do;
      strleft = substr(str,1, k-1);
      Day = compress(strleft,' ','kd');
   end;
   else Day = put(day(DOB),8.);

   *put Age = Month = Day =;
keep ID DOB Age Month Day;
run;

If this does not meet your need, please clarify.

ed_sas_member
Meteorite | Level 14

Hi @Sathish_jammy 

 

Here is an attempt to solve your problem:

data aaa2;
	set aaa;
	format Cancer_onset_Date mmddyy10.;

	/* Retrieve year of detection*/	
	if prxmatch('/^(.*)(\d{4})(.*)$/i',Comments) then 
		comments_year_detect  = prxchange('s/(.*)(\d{4})(.*)/$2/',1,Comments);

	/* Retrieve age of detection year, month and day*/
		/*NB: the following research is case insensitive*/
		/*Assumption 1: the Year is preceded by 'detected at the age of'*/
		/*Some spelling variants have been considered: e.g. Years|Yrs|Year|Yr*/
		
	if prxmatch('/^(.*)(detected at the age of)\s+(\d|\d{2})(\sYear)(.*)$/i',Comments) then 
		Cancer_onset_Date_y = prxchange('s/^(.*)(detected at the age of)\s+(\d|\d{2})(\sYear)(.*)$/$3/i',1,Comments);
	if prxmatch('/^(.*)(Years|Yrs|Year|Yr)\s+(\d|\d{2})\s+(Month)(.*)$/i',Comments) then 
		Cancer_onset_Date_m = prxchange('s/^(.*)(Years|Yrs|Year|Yr)\s+(\d|\d{2})\s+(Month)(.*)$/$3/i',1,Comments);
	if prxmatch('/^(.*)(Months|Mths|Month|Mth)\s+(\d|\d{2})\s+(Day)(.*)$/i',Comments) then 
		Cancer_onset_Date_d = prxchange('s/^(.*)(Months|Mths|Month|Mth)\s+(\d|\d{2})\s+(Day)(.*)$/$3/i',1,Comments);

	/* Compute Cancer_onset_Date. If the month/date is not specified, replace the month/date as same as from the IDs DOB */
	
	Cancer_onset_Date = intnx('year',DOB,Cancer_onset_Date_y,"s");
	if Cancer_onset_Date_m ne "" then Cancer_onset_Date = intnx('month',Cancer_onset_Date,Cancer_onset_Date_m,"s");
	if Cancer_onset_Date_d ne "" then Cancer_onset_Date = intnx('day',Cancer_onset_Date,Cancer_onset_Date_d,"s");

run;

 Capture d’écran 2020-01-06 à 09.59.48.png

 

andreas_lds
Jade | Level 19

With the data you have posted, the following steps seems to extract the data you need:

 

data want;
   set aaa;

   length 
      rx_year rx_month_day 8
      year month day 8
   ;

   rx_year = prxparse('/(year|yr) (\d+)/i');
   rx_month_day = prxparse('/years (\d+) month (\d+) days/i');

   comments = compbl(comments);

   if prxmatch(rx_year, comments) then do;
      year = input(prxposn(rx_year, 2, comments), ? 4.);
   end;

   if prxmatch(rx_month_day, comments) then do;
      month = input(prxposn(rx_month_day, 1, comments), ? 2.);
      day = input(prxposn(rx_month_day, 2, comments), ? 2.);
   end;
   else do;
      month = month(DOB);
      day = day(dob);
   end;


   drop rx_:;
run;
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 1140 views
  • 3 likes
  • 4 in conversation