BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Cruise
Ammonite | Level 13

Dear SAS experts:

 

I'm trying to recreate a SURV_MM variable in the gold-standard dataset. INTCK function created identical values except for the dates with DEC 31. Desired result is the SURV_MM and N_MONTH is what I ended up with INTCK function as coded below.

 

post_it.png

 

Any idea how to recreate SURV_MM for the dates with DEC31 that INTCK function appeared to produce a result by one month off?

 

DATA HAVE;
INPUT DX EXIT SURV_MM;
CARDS;
6002 13148  233.5 
10019 13148 101.5
10750 13148 77.5 
11480 13148 53.5 
11846 13148 41.5 
10385 13148 89.5 
;
RUN; 

DATA HAVE1; SET HAVE;
N_MONTH=0.5+intck('month',dx,exit);
CHECK_MO=N_MONTH-SURV_MM;
RUN;
PROC PRINT DATA=HAVE1;
FORMAT DX EXIT DATE9.;
RUN;

 

1 ACCEPTED SOLUTION

Accepted Solutions
Patrick
Opal | Level 21

@Cruise 

It indicates that they didn't use INTCK() but just used 365.25 days in a formula. That's normally only the 2nd best option. INTCK() also gets it right with leap years.

 

It doesn't explain why the result can be different for the exactly same start and end date.

View solution in original post

12 REPLIES 12
PeterClemmensen
Tourmaline | Level 20

What is your desired result here?

Cruise
Ammonite | Level 13
SURV_MM. thanks for asking!
Tom
Super User Tom
Super User

INTCK counts the number of BOUNDARIES crossed. 

The last day of December is just BEFORE the boundary into January.  Why are you only checking last day of December.  Do you have any dates that are the last day of other months?

What formula was used to create the number you are comparing to?

Cruise
Ammonite | Level 13

@Tom 

 

Why are you only checking last day of December. Do you have any dates that are the last day of other months?

 

- Yes, there are other cases. Please see my full SAS code below. INTCK function did the job except for these 6 cases. I'm trying to recreate SURV_MM and SURV_YY in the colon.colon.

 

What formula was used to create the number you are comparing to?

 

I will check in the Dickman's directory where he might have showed how SURV_MM and SURV_YY were created. Thanks for asking this. Maybe he showed else where in one of sub folders.

 

/*COLON.SAS7BDAT CAN BE FOUND IN THIS LINK http://www.pauldickman.com/survival/?dir=sas*/
/*THIS IS MY ATTEMPT TO RECREATE SURV_MM AND SURV_YY*/
DATA COLON; /*N=15,564*/ SET Colon.colon(DROP=status subsite year8594 agegrp ); /*N=15,564*/
id=_n_; 
n_mo=0.5+intck('month',dx,exit);
n_yy=0.5+int(yrdif(dx,exit,'actual'));
check_mo=n_mo-surv_mm; /*CHECK_MO=1 FOR 6 CASES*/
check_yy=n_yy-surv_yy; /*CHECK_YY=0 FOR ALL CASES*/
RUN;

proc freq data=colon;
tables check_mo check_yy;
run;

PROC PRINT DATA=COLON;
WHERE CHECK_MO=1;
RUN;

 

Cruise
Ammonite | Level 13

@Tom I couldn't find .sas file in Dickman's directory where he showed how SURV_MM and SURv_YY were created.

http://www.pauldickman.com/survival/?dir=sas

 

below is the screenshot again.

 

TOM.png

Patrick
Opal | Level 21

@Cruise 

Analyzing the source data with code below it appears that the same values for DX and EXIT not always result in the same value for SURV_MM.

There is either a glitch in the data or there must be an additional bit of logic for calculating SURV_MM.

proc sql;
  select dx, exit, surv_mm, count(surv_mm) as surv_mm_cnt
  from
    (
      select dx, exit, surv_mm
      from source.colon
      group by dx, exit
      having count(distinct surv_mm)>1
    )
  group by dx, exit, surv_mm
  ;
quit;

 

Capture.JPG 

Tom
Super User Tom
Super User

Yeah it looks like they are just errors in the data.   

Cruise
Ammonite | Level 13

They stated "underlying time unit is now days we specify scale=365.25 to transform to years" in the page 8 of the documention below.

 

http://www.pauldickman.com/survival/sas/relative_survival_using_sas.pdf

 

Would this explain "additional logic" @Patrick pointed out? @Tom 

Patrick
Opal | Level 21

@Cruise 

It indicates that they didn't use INTCK() but just used 365.25 days in a formula. That's normally only the 2nd best option. INTCK() also gets it right with leap years.

 

It doesn't explain why the result can be different for the exactly same start and end date.

Cruise
Ammonite | Level 13
Thank you. I'll use 365.25 and take integer of the resulting decimals and compare.
Patrick
Opal | Level 21

@Cruise wrote:
Thank you. I'll use 365.25 and take integer of the resulting decimals and compare.

That won't lead to a different outcome. In the source data there are different month counts for the exactly same start and end date. It looks very much like this is simply a DQ issue.

Cruise
Ammonite | Level 13
I see. Thanks for the pointer.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 12 replies
  • 1863 views
  • 4 likes
  • 4 in conversation