I have two datasets that I'm trying to merge to get a new variable I'm interested in.
The first dataset, data1, has one row per ID. The second one, data, has many rows (dates) per ID, like this:
data data1;
input id :$2. var1;
datalines;
01 1
02 2
03 3
;
data Data2;
infile datalines dlm="|";
input ID :$2. date :ddmmyy. (code1 code2 code3) ($) payment;
format callDate ddmmyy10.;
datalines;
01 | 01/01/2020 | AA | AA | AA | 1
01 | 02/01/2020 | AA | AA | AA | 1
01 | 03/01/2020 | AA | AA | AA | 1
01 | 12/02/2020 | AA | AA | AA | 2
01 | 13/02/2020 | BB | AA | AA | 2
01 | 14/01/2020 | BB | AA | AA | 2
01 | 15/01/2020 | CC | BB | AA | 2
02 | 03/01/2020 | . | . | . | 1
02 | 04/01/2020 | AA | . | . | 1
02 | 05/01/2020 | AA | . | . | 1
02 | 06/01/2020 | CC | AA | . | 1
02 | 07/01/2020 | CC | AA | . | 1
03 | 20/03/2020 | AA | AA | . | 2
03 | 21/03/2020 | AA | AA | . | 3
;
run;
I want to merge them so that I get all the columns in data1
and I get a two new columns: position
and date
that would have the firs date in which code1
, code2
or code3
get BB or CC OR payment
is larger or equal than 3, per ID.
Basically in this case this should be the outcome:
ID | Var1 | Date | Position
01 | 1 | 13/02/2020 | code
02 | 2 | 06/01/2020 | code
03 | 3 | 21/03/2020 | payment
Var1
is just an example of the variables in data1
This is the code I tried, but it's not working as it should. I'm sure there's a detail wrong but I can't figure out what it is.
proc sql;
create table new as
select a.*
, b.position
from data1 a
left join (select distinct id
, datestamp format=ddmmyy10.
, case when whichc("AA", code1, code2, code3) > 0 then "code"
when whichc("BB", code1, code2, code3) > 0 then "code"
when payment >= 3 then "payment" else . end as position
from data2
group by id
having datestamp=min(datestamp)) b
on a.id = b.id;
quit;
Any ideas on how I could make this work?
SQL code is prefered since the rest of my code is in that format.
Please test your datalines code in the future, so we do not have to fix it first.
See this:
data Data1;
infile datalines dlm="|";
input ID :$2. date :ddmmyy. (code1 code2 code3) ($) payment;
format date ddmmyy10.;
datalines;
01 | 01/01/2020 | AA | AA | AA | 1
01 | 02/01/2020 | AA | AA | AA | 1
01 | 03/01/2020 | AA | AA | AA | 1
01 | 12/02/2020 | AA | AA | AA | 2
01 | 13/02/2020 | BB | AA | AA | 2
01 | 14/01/2020 | BB | AA | AA | 2
01 | 15/01/2020 | CC | BB | AA | 2
02 | 03/01/2020 | . | . | . | 1
02 | 04/01/2020 | AA | . | . | 1
02 | 05/01/2020 | AA | . | . | 1
02 | 06/01/2020 | CC | AA | . | 1
02 | 07/01/2020 | CC | AA | . | 1
03 | 20/03/2020 | AA | AA | . | 2
03 | 21/03/2020 | AA | AA | . | 3
;
data data2;
input id :$2. var1;
datalines;
01 1
02 2
03 3
;
data want;
merge
data2
data1
;
by id;
retain flag;
length position $7;
if first.id then flag = 1;
if flag
then do;
if code1 in ('BB','CC') or code2 in ('BB','CC') or code3 in ('BB','CC')
then do;
flag = 0;
position = "code";
end;
else if payment ge 3
then do;
flag = 0;
position = "payment";
end;
if flag = 0 then output;
end;
keep id var1 date position;
run;
First off, you refer to the data set called 'data1' as both the data that has a single and multiple rows per ID.
Can you post examples of both data sets?
Also, is it safe to assume that your data is sorted by ID and date?
Maxim 14: Use the Right Tool.
In this case, it's the data step.
The fact that you seem to not be very versed in data step programming yet does not mean that you should not use it, but that you have to learn it. NOW. It is THE "Swiss Army Knife" of SAS.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.