Re: Merge two datasets to obtain first date in which condition happene...

catkat96 · Posted 04-22-2021 05:11 AM

I have two datasets that I'm trying to merge to get a new variable I'm interested in.

The first dataset, data1, has one row per ID. The second one, data, has many rows (dates) per ID, like this:

data data1;
input id :$2. var1;
datalines;
01 1
02 2
03 3
;

data Data2;
infile datalines dlm="|";
input ID :$2. date :ddmmyy. (code1 code2 code3) ($) payment;
format callDate ddmmyy10.;
datalines;
01 | 01/01/2020 | AA    | AA    | AA    | 1
01 | 02/01/2020 | AA    | AA    | AA    | 1
01 | 03/01/2020 | AA    | AA    | AA    | 1
01 | 12/02/2020 | AA    | AA    | AA    | 2
01 | 13/02/2020 | BB    | AA    | AA    | 2
01 | 14/01/2020 | BB    | AA    | AA    | 2
01 | 15/01/2020 | CC    | BB    | AA    | 2
02 | 03/01/2020 | .     | .     | .     | 1
02 | 04/01/2020 | AA    | .     | .     | 1
02 | 05/01/2020 | AA    | .     | .     | 1 
02 | 06/01/2020 | CC    | AA    | .     | 1
02 | 07/01/2020 | CC    | AA    | .     | 1
03 | 20/03/2020 | AA    | AA    | .     | 2
03 | 21/03/2020 | AA    | AA    | .     | 3 
;
run;

I want to merge them so that I get all the columns in data1 and I get a two new columns: position and date that would have the firs date in which code1, code2 or code3 get BB or CC OR payment is larger or equal than 3, per ID.

Basically in this case this should be the outcome:

ID | Var1 | Date       | Position 
01 | 1    | 13/02/2020 | code
02 | 2    | 06/01/2020 | code   
03 | 3    | 21/03/2020 | payment

Var1 is just an example of the variables in data1

This is the code I tried, but it's not working as it should. I'm sure there's a detail wrong but I can't figure out what it is.

proc sql;
create table new as
select a.*
 , b.position
 from data1 a
 left join (select distinct id
            , datestamp format=ddmmyy10.
            , case when whichc("AA", code1, code2, code3) > 0 then "code"
                   when whichc("BB", code1, code2, code3) > 0 then "code"
                   when payment >= 3 then "payment" else . end as position
            from data2
            group by id
            having datestamp=min(datestamp)) b
    on a.id = b.id;
        quit;

Any ideas on how I could make this work?

SQL code is prefered since the rest of my code is in that format.

Kurt_Bremser · Posted 04-22-2021 05:38 AM

Please test your datalines code in the future, so we do not have to fix it first.

See this:

data Data1;
infile datalines dlm="|";
input ID :$2. date :ddmmyy. (code1 code2 code3) ($) payment;
format date ddmmyy10.;
datalines;
01 | 01/01/2020 | AA    | AA    | AA    | 1
01 | 02/01/2020 | AA    | AA    | AA    | 1
01 | 03/01/2020 | AA    | AA    | AA    | 1
01 | 12/02/2020 | AA    | AA    | AA    | 2
01 | 13/02/2020 | BB    | AA    | AA    | 2
01 | 14/01/2020 | BB    | AA    | AA    | 2
01 | 15/01/2020 | CC    | BB    | AA    | 2
02 | 03/01/2020 | .     | .     | .     | 1
02 | 04/01/2020 | AA    | .     | .     | 1
02 | 05/01/2020 | AA    | .     | .     | 1 
02 | 06/01/2020 | CC    | AA    | .     | 1
02 | 07/01/2020 | CC    | AA    | .     | 1
03 | 20/03/2020 | AA    | AA    | .     | 2
03 | 21/03/2020 | AA    | AA    | .     | 3 
;

data data2;
input id :$2. var1;
datalines;
01 1
02 2
03 3
;

data want;
merge
  data2
  data1
;
by id;
retain flag;
length position $7;
if first.id then flag = 1;
if flag
then do;
  if code1 in ('BB','CC') or code2 in ('BB','CC') or code3 in ('BB','CC')
  then do;
    flag = 0;
    position = "code";
  end;
  else if payment ge 3
  then do;
    flag = 0;
    position = "payment";
  end;
  if flag = 0 then output;
end;
keep id var1 date position;
run;

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

PeterClemmensen · Posted 04-22-2021 05:39 AM

First off, you refer to the data set called 'data1' as both the data that has a single and multiple rows per ID.

Can you post examples of both data sets?

The DATA to DATA Step Macro
Blog: SASnrd

PeterClemmensen · Posted 04-22-2021 06:18 AM

Also, is it safe to assume that your data is sorted by ID and date?

The DATA to DATA Step Macro
Blog: SASnrd

catkat96 · Posted 04-22-2021 07:18 AM

Yes, data is sorted by ID and Date.

Sorry for that, fixed that format error, now it should be fine.

I also forgot to add that if it could be a proc sql answer that would be great, cause the rest of my code is all in proc sql

Kurt_Bremser · Posted 04-22-2021 07:35 AM

Maxim 14: Use the Right Tool.

In this case, it's the data step.

The fact that you seem to not be very versed in data step programming yet does not mean that you should not use it, but that you have to learn it. NOW. It is THE "Swiss Army Knife" of SAS.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Merge two datasets to obtain first date in which condition happened

Re: Merge two datasets to obtain first date in which condition happened

Re: Merge two datasets to obtain first date in which condition happened

Re: Merge two datasets to obtain first date in which condition happened

Re: Merge two datasets to obtain first date in which condition happened

Re: Merge two datasets to obtain first date in which condition happened

Merge two datasets to obtain first date in which condition happened

Re: Merge two datasets to obtain first date in which condition happened

Re: Merge two datasets to obtain first date in which condition happened

Re: Merge two datasets to obtain first date in which condition happened

Re: Merge two datasets to obtain first date in which condition happened

Re: Merge two datasets to obtain first date in which condition happened

Register Today!

SAS Training: Just a Click Away