Solved: Many to many merge without using PROC SQL for EPOCH

mozty · Posted 04-11-2020 08:27 PM

Dear community,

I am trying to merge EPOCH when SESTDTC<=ZEDTC<SEENDTC or ZEDTC>=SESTDTC (if SEENDTC is missing) and keep the latest value where this is true. Proc SQL is time consuming when dealing with huge datasets, so I am looking for alternatives.

USUBJID	TAETORD	EPOCH	SESTDTC	SEENDTC
1234	1	SCREENING	2019-02-20	2019-03-04T08:25
1234	2	TREATMENT	2019-03-04T08:25	2019-03-09
1234	3	FOLLOW-UP	2019-03-09	2019-04-01
1000	1	SCREENING	2019-07-07

USUBJID	VISITNUM	ZESPID	ZEDTC	EPOCH
1234	1	1	2019-02-24	SCREENING
1234	2	1	2019-03-04T08:25	TREATMENT
1234	2	2	2019-03-04	TREATMENT
1234	3	1	2019-03	FOLLOW-UP
1234	3	2	2019-03-09T12:35	FOLLOW-UP
1000	1	1	2019-07-20	SCREENING

Things to consider:

- the EPOCH variable in blue is the result after merging both tables

- all date variables are in ISO8601 format with or without time and ZEDTC can even have partial date values

Please let me know if you need additional information. Thanks.

Tom · Posted 04-12-2020 02:14 PM

Try just setting them by subject and date (start date / event date). Set the epoch dataset first and retain the EPOCH value from it going forward.

data via_ds;
  set se(in=se rename=(sestdtc=zedtc)) ze (in=ze) ;
  by usubjid zedtc ;
  length new_epoch $9;
  retain new_epoch;
  if first.usubjid then new_epoch=' ';
  if se then new_epoch=epoch;
  if ze ;
  keep USUBJID VISITNUM ZESPID ZEDTC new_epoch;
  rename new_epoch=EPOCH;
run;

Results:

Obs    USUBJID    zedtc               VISITNUM    ZESPID      EPOCH

 1      1000      2019-07-20              1         1       SCREENING
 2      1234      2019-02-24              1         1       SCREENING
 3      1234      2019-03                 3         1       SCREENING
 4      1234      2019-03-04              2         2       SCREENING
 5      1234      2019-03-04T08:25        2         1       TREATMENT
 6      1234      2019-03-09T12:35        3         2       FOLLOW-UP

View solution in original post

Kurt_Bremser · Posted 04-12-2020 01:53 AM

Please show us examples for the datasets before joining. Use data steps with datalines, so we can quickly recreate the datasets for testing. Use the "little running man" to post the data step codes.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

mozty · Posted 04-12-2020 04:36 AM

Please see below:

data se;
length USUBJID $4 TAETORD 8 EPOCH $9 SESTDTC SEENDTC $16;
infile datalines dsd;
input USUBJID TAETORD EPOCH SESTDTC SEENDTC;
datalines;
1234,1,SCREENING,2019-02-20,2019-03-04T08:25
1234,2,TREATMENT,2019-03-04T08:25,2019-03-09
1234,3,FOLLOW-UP,2019-03-09,2019-04-01
1000,1,SCREENING,2019-07-07,
;
run;

data ze;
length USUBJID $4 VISITNUM 8 ZESPID $1 ZEDTC $16;
infile datalines dsd;
input USUBJID VISITNUM ZESPID ZEDTC;
datalines;
1234,1,1,2019-02-24
1234,2,1,2019-03-04T08:25
1234,2,2,2019-03-04
1234,3,1,2019-03
1234,3,2,2019-03-09T12:35
1000,1,1,2019-07-20
;
run;

geoskiad · Posted 04-12-2020 06:22 AM

Hi motzy,
Instead of merging EPOCH from SE I would just derive it based on IF/ELSE by including the start/end dates of each EPOCH as additional variables and checking versus ZEDTC in an intermediate step (6 variables in total for 3 EPOCHs).

Kurt_Bremser · Posted 04-12-2020 06:49 AM

How many observations do you have in both of these datasets in real life?

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

mozty · Posted 04-12-2020 08:36 AM

ZE between 20000 and 60000 and SE between 100 and 500.

I'm working on a SAS EG server and having 3-4 SQL procedures in one program can make the program run for 15-30 minutes.

geoskiad · Posted 04-12-2020 11:57 AM

Since you only have 3 EPOCHs you could do the "many to many" merge manually. OUTPUT your ZE dataset 3 times and each time give an artificial value ii from 1 to 3 for each of the 3 duplicate rows. In the existing SE domain, give for Screening ii=1, for Treatment ii=2, and for Follow-Up ii=3. Merge them together by subject and ii. Then check which ZEDTC falls in the interval of interest and keep only those records.

Alternatively you could split SE into 3 separate datasets based on EPOCH. Do the merge with ZE 3 times checking each time ZEDTC vs SEST/ENDTC. In the end you can keep those records that fall in the date interval and there should be 1 record per subject.

Tom · Posted 04-12-2020 02:11 PM

What is the result you want from that data? In particular which of these two results do you want?

Obs    USUBJID    VISITNUM    ZESPID    ZEDTC                 EPOCH
 5      1234          2         1       2019-03-04T08:25    TREATMENT
 6      1234          2         1       2019-03-04T08:25    SCREENING

Tom · Posted 04-12-2020 02:14 PM

Try just setting them by subject and date (start date / event date). Set the epoch dataset first and retain the EPOCH value from it going forward.

data via_ds;
  set se(in=se rename=(sestdtc=zedtc)) ze (in=ze) ;
  by usubjid zedtc ;
  length new_epoch $9;
  retain new_epoch;
  if first.usubjid then new_epoch=' ';
  if se then new_epoch=epoch;
  if ze ;
  keep USUBJID VISITNUM ZESPID ZEDTC new_epoch;
  rename new_epoch=EPOCH;
run;

Results:

Obs    USUBJID    zedtc               VISITNUM    ZESPID      EPOCH

 1      1000      2019-07-20              1         1       SCREENING
 2      1234      2019-02-24              1         1       SCREENING
 3      1234      2019-03                 3         1       SCREENING
 4      1234      2019-03-04              2         2       SCREENING
 5      1234      2019-03-04T08:25        2         1       TREATMENT
 6      1234      2019-03-09T12:35        3         2       FOLLOW-UP

mozty · Posted 04-13-2020 07:51 AM

This is what I wanted.

Thank you all for your suggestions.

Many to many merge without using PROC SQL for EPOCH

Re: Many to many merge without using PROC SQL for EPOCH

Re: Many to many merge without using PROC SQL for EPOCH

Re: Many to many merge without using PROC SQL for EPOCH

Re: Many to many merge without using PROC SQL for EPOCH

Re: Many to many merge without using PROC SQL for EPOCH

Re: Many to many merge without using PROC SQL for EPOCH

Re: Many to many merge without using PROC SQL for EPOCH

Re: Many to many merge without using PROC SQL for EPOCH

Re: Many to many merge without using PROC SQL for EPOCH

Re: Many to many merge without using PROC SQL for EPOCH

Classroom Training Available!