Hello everyone,
I would like to have an alternate data step code for the below sql code. Any help would be appreciated.
proc sql;
create table crs(where=(anl02fl='Y')) as
select a.*, b.anlfl as anl02fl, b.astdt, b.aendt, b.aedecod
from adbmk as a
left join adae(where=crsfl='y')) as b
on a.usubjid=b.usubjid and (b.astdt-1<=a.adt<=b.aendt2+ 3)
;
quit;
proc sort data=crs nodupkey;
by usubjid adt adtm;
run;
on a.usubjid=b.usubjid and (b.astdt-1<=a.adt<=b.aendt2+ 3)
As far as I know, this ON clause where you are using inequalities and boolean conditions isn't possible in a DATA step merge. A DATA step merge requires exact match.
@RAVI2000 wrote:
Can it be done in multiple data steps?
Why not use the tool that does exactly what you want in one step?
data crs(where=(anl04fl='Y'));
merge adbmk(in=bmk) adae(in=ae where=(crsfl='Y') rename=(anlfl=anl04fl));
by usubjid;
if bmk;
if astdt-1 <= adt <= aeendt2 +3;
run;
proc sort data=crs nodupkey;
by usubjid adt adtm;
run;
Is this not equivalent?
Seems as if you have answered your own question, @RAVI2000 .
As far as "is it equivalent?" I leave that to you to test on your data.
@RAVI2000 wrote:
Well they don't match. That's what I am confused about. Aren't they both meet the same condition?
Not when there are multiple observations for the same value of USUBJID in either or the two inputs.
@RAVI2000 wrote:
Good catch, both of the datasets have multiple observations for the same value of USUBJID. What can be done here?
So USUBJID is NOT a UNIQUE key for either dataset. What are the combination variables that uniquely identify the observations in each of the datasets?
In general to do a MANY to MANY join you want to use SQL.
But depending on what you are doing then perhaps you can use a simple INTERLEAVE
data want;
set tablea(in=inA) tableb(in=inB) ;
by usubjid date ;
....
to get what you want, but would need to know more about the dataset and the purpose of the join.
@RAVI2000 wrote:
Well they don't match.
So you have information that you are not sharing with us. Please share it with us. Give an example of several records that don't match doing it via SQL and doing it via DATA step.
That's what I am confused about. Aren't they both meet the same condition?
Running the code in SAS and looking at the results is much more authoritative than any answer I (or anyone else here in this forum) can give.
data adbmk;
input usubjid $ adt $ paramcd $;
cards;
100001 20MAY19 E2SRAL
100001 20MAY19 EF
100001 20MAY19 ETLK5
100003 18JUL19 ETLK8
100003 29JUL19 ETLK8
100014 08JUL19 E2SRAL
100014 08JUL19 EF
100014 09JUL19 ETLK5
100014 09JUL19 ETLK8
100014 11JUL19 E2SRAL
100014 11JUL19 EF
100014 11JUL19 ETLK5
100014 11JUL19 ETLK8
100014 12JUL19 E2SRAL
100014 12JUL19 EF
100004 11JUL19 E12P70
100004 11JUL19 E2SRAL
100006 14AUG19 ENG
100006 14AUG19 ETLK13
100006 14AUG19 ETLK17
100007 30AUG19 E12P70
100007 30AUG19 E2SRAL
100015 24JAN20 E12P70
100015 24JAN20 E2SRAL
100021 07APR20 EF
100021 07APR20 ENG
100021 07APR20 ETLK10
100023 28MAY20 ETLK6
100029 06AUG20 ETLK2
100029 06AUG20 ETLK4
100031 12OCT20 E12P70
100031 12OCT20 E2SRAL
100036 19OCT20 ETLK1B
100036 19OCT20 ETLK2
100038 03NOV20 ETLK6
100040 05NOV20 E2SRAL
100042 27OCT20 EF
100042 27OCT20 ENG
100045 17NOV20 ETLK8
100045 20NOV20 E12P70
100048 14DEC20 ETLK4
100048 14DEC20 ETLK5
100059 13MAR21 ETLK13
;
run;
data adae;
input usubjid $ astdt $ trt01a $ crsfl $ aendt $;
cards;
100014 03FEB20 3 Y 03FEB20
100018 10FEB20 3 Y 12FEB20
100018 13FEB20 3 Y 13FEB20
100029 10AUG20 5 Y 10AUG20
100029 17AUG20 5 Y 17AUG20
100031 06AUG20 5 Y 07AUG20
100031 07AUG20 5 Y 21AUG20
100032 11SEP20 5 Y 12SEP20
100040 24OCT20 5 Y 28OCT20
100047 21NOV20 11 Y 21NOV20
100047 25NOV20 11 Y 25NOV20
100047 28NOV20 11 Y 29NOV20
100047 04DEC20 11 Y 05DEC20
100047 15DEC20 11 Y 15DEC20
100047 22DEC20 11 Y 22DEC20
100048 15DEC20 11 Y 16DEC20
100048 18DEC20 11 Y 19DEC20
100048 22DEC20 11 Y 23DEC20
100048 29DEC20 11 Y 29DEC20
100048 05JAN21 11 Y 06JAN21
100048 12JAN21 11 Y 12JAN21
100049 15DEC20 11 Y 16DEC20
100049 28JAN21 11 Y 29JAN21
100056 30JAN21 12 Y 01FEB21
100056 01FEB21 12 Y 02FEB21
100059 02FEB21 12 Y 02FEB21
100059 05FEB21 12 Y 05FEB21
100059 25FEB21 12 Y 26FEB21
100061 19FEB21 11 Y 20FEB21
;
run;
Here is the data from both the datasets.
I want to create a flag ANL02FL flag.
ANL02FL flag records within ADBMK if a subject experience an CRS.
The following logic should be applied to determine if a reading falls within the given windiow.
IF ADAE.ASTDT-1 <= ADBMK.ADT <= ADAE.AENDT+3;
Your SQL code does not work with your sample data. If you could supply SQL code that works properly with the data (and data which has numeric variables where the SQL expects them), then it will be much easier to determine the equivalent DATA step code.
The problem you have identified, doing a many-to-many in a DATA step takes a bit of coding. But use of a hash object can make it reliable and useful.
You don't say what's wrong with either result.
Maybe...
data crs(where=(anl04fl='Y'));
merge adbmk(in=bmk) adae(in=ae where=(crsfl='Y') rename=(anlfl=anl04fl));
by usubjid;
if bmk;
if not ae or (astdt-1 <= adt <= aeendt2 +3) then output;
run;
This will keep the usubjid that are absent from table adae.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.