BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
mozty
Calcite | Level 5

Dear community,

 

I am trying to merge EPOCH when SESTDTC<=ZEDTC<SEENDTC or ZEDTC>=SESTDTC (if SEENDTC is missing) and keep the latest value where this is true. Proc SQL is time consuming when dealing with huge datasets, so I am looking for alternatives.

 

USUBJIDTAETORDEPOCHSESTDTCSEENDTC
12341SCREENING2019-02-202019-03-04T08:25
12342TREATMENT2019-03-04T08:252019-03-09
12343FOLLOW-UP2019-03-092019-04-01
10001SCREENING2019-07-07 

 

USUBJIDVISITNUMZESPIDZEDTCEPOCH
1234112019-02-24SCREENING
1234212019-03-04T08:25TREATMENT
1234222019-03-04TREATMENT
1234312019-03FOLLOW-UP
1234322019-03-09T12:35FOLLOW-UP
1000112019-07-20SCREENING

 

Things to consider:

- the EPOCH variable in blue is the result after merging both tables

- all date variables are in ISO8601 format with or without time and ZEDTC can even have partial date values

 

Please let me know if you need additional information. Thanks.

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

Try just setting them by subject and date (start date / event date).   Set the epoch dataset first and retain the EPOCH value from it going forward.

data via_ds;
  set se(in=se rename=(sestdtc=zedtc)) ze (in=ze) ;
  by usubjid zedtc ;
  length new_epoch $9;
  retain new_epoch;
  if first.usubjid then new_epoch=' ';
  if se then new_epoch=epoch;
  if ze ;
  keep USUBJID VISITNUM ZESPID ZEDTC new_epoch;
  rename new_epoch=EPOCH;
run;

Results:

Obs    USUBJID    zedtc               VISITNUM    ZESPID      EPOCH

 1      1000      2019-07-20              1         1       SCREENING
 2      1234      2019-02-24              1         1       SCREENING
 3      1234      2019-03                 3         1       SCREENING
 4      1234      2019-03-04              2         2       SCREENING
 5      1234      2019-03-04T08:25        2         1       TREATMENT
 6      1234      2019-03-09T12:35        3         2       FOLLOW-UP

View solution in original post

9 REPLIES 9
Kurt_Bremser
Super User

Please show us examples for the datasets before joining. Use data steps with datalines, so we can quickly recreate the datasets for testing. Use the "little running man" to post the data step codes.

mozty
Calcite | Level 5

Please see below:

 

data se;
length USUBJID $4 TAETORD 8 EPOCH $9 SESTDTC SEENDTC $16;
infile datalines dsd;
input USUBJID TAETORD EPOCH SESTDTC SEENDTC;
datalines;
1234,1,SCREENING,2019-02-20,2019-03-04T08:25
1234,2,TREATMENT,2019-03-04T08:25,2019-03-09
1234,3,FOLLOW-UP,2019-03-09,2019-04-01
1000,1,SCREENING,2019-07-07,
;
run;

data ze;
length USUBJID $4 VISITNUM 8 ZESPID $1 ZEDTC $16;
infile datalines dsd;
input USUBJID VISITNUM ZESPID ZEDTC;
datalines;
1234,1,1,2019-02-24
1234,2,1,2019-03-04T08:25
1234,2,2,2019-03-04
1234,3,1,2019-03
1234,3,2,2019-03-09T12:35
1000,1,1,2019-07-20
;
run;
geoskiad
Fluorite | Level 6
Hi motzy,
Instead of merging EPOCH from SE I would just derive it based on IF/ELSE by including the start/end dates of each EPOCH as additional variables and checking versus ZEDTC in an intermediate step (6 variables in total for 3 EPOCHs).
mozty
Calcite | Level 5

ZE between 20000 and 60000 and SE between 100 and 500.

 

I'm working on a SAS EG server and having 3-4 SQL procedures in one program can make the program run for 15-30 minutes.

geoskiad
Fluorite | Level 6
Since you only have 3 EPOCHs you could do the "many to many" merge manually. OUTPUT your ZE dataset 3 times and each time give an artificial value ii from 1 to 3 for each of the 3 duplicate rows. In the existing SE domain, give for Screening ii=1, for Treatment ii=2, and for Follow-Up ii=3. Merge them together by subject and ii. Then check which ZEDTC falls in the interval of interest and keep only those records.

Alternatively you could split SE into 3 separate datasets based on EPOCH. Do the merge with ZE 3 times checking each time ZEDTC vs SEST/ENDTC. In the end you can keep those records that fall in the date interval and there should be 1 record per subject.
Tom
Super User Tom
Super User

What is the result you want from that data?  In particular which of these two results do you want?

Obs    USUBJID    VISITNUM    ZESPID    ZEDTC                 EPOCH
 5      1234          2         1       2019-03-04T08:25    TREATMENT
 6      1234          2         1       2019-03-04T08:25    SCREENING
Tom
Super User Tom
Super User

Try just setting them by subject and date (start date / event date).   Set the epoch dataset first and retain the EPOCH value from it going forward.

data via_ds;
  set se(in=se rename=(sestdtc=zedtc)) ze (in=ze) ;
  by usubjid zedtc ;
  length new_epoch $9;
  retain new_epoch;
  if first.usubjid then new_epoch=' ';
  if se then new_epoch=epoch;
  if ze ;
  keep USUBJID VISITNUM ZESPID ZEDTC new_epoch;
  rename new_epoch=EPOCH;
run;

Results:

Obs    USUBJID    zedtc               VISITNUM    ZESPID      EPOCH

 1      1000      2019-07-20              1         1       SCREENING
 2      1234      2019-02-24              1         1       SCREENING
 3      1234      2019-03                 3         1       SCREENING
 4      1234      2019-03-04              2         2       SCREENING
 5      1234      2019-03-04T08:25        2         1       TREATMENT
 6      1234      2019-03-09T12:35        3         2       FOLLOW-UP
mozty
Calcite | Level 5

This is what I wanted.

 

Thank you all for your suggestions.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 1110 views
  • 2 likes
  • 4 in conversation