BookmarkSubscribeRSS Feed
catkat96
Obsidian | Level 7

I have two datasets that I'm trying to merge to get a new variable I'm interested in.

The first dataset, data1, has one row per ID. The second one, data, has many rows (dates) per ID, like this:

 

 

 

 

data data1;
input id :$2. var1;
datalines;
01 1
02 2
03 3
;

 

 

 

data Data2;
infile datalines dlm="|";
input ID :$2. date :ddmmyy. (code1 code2 code3) ($) payment;
format callDate ddmmyy10.;
datalines;
01 | 01/01/2020 | AA    | AA    | AA    | 1
01 | 02/01/2020 | AA    | AA    | AA    | 1
01 | 03/01/2020 | AA    | AA    | AA    | 1
01 | 12/02/2020 | AA    | AA    | AA    | 2
01 | 13/02/2020 | BB    | AA    | AA    | 2
01 | 14/01/2020 | BB    | AA    | AA    | 2
01 | 15/01/2020 | CC    | BB    | AA    | 2
02 | 03/01/2020 | .     | .     | .     | 1
02 | 04/01/2020 | AA    | .     | .     | 1
02 | 05/01/2020 | AA    | .     | .     | 1 
02 | 06/01/2020 | CC    | AA    | .     | 1
02 | 07/01/2020 | CC    | AA    | .     | 1
03 | 20/03/2020 | AA    | AA    | .     | 2
03 | 21/03/2020 | AA    | AA    | .     | 3 
;
run;

I want to merge them so that I get all the columns in data1 and I get a two new columns: position and date that would have the firs date in which code1, code2 or code3 get BB or CC OR payment is larger or equal than 3, per ID.

Basically in this case this should be the outcome:

ID | Var1 | Date       | Position 
01 | 1    | 13/02/2020 | code
02 | 2    | 06/01/2020 | code   
03 | 3    | 21/03/2020 | payment

Var1 is just an example of the variables in data1

This is the code I tried, but it's not working as it should. I'm sure there's a detail wrong but I can't figure out what it is.

proc sql;
create table new as
select a.*
 , b.position
 from data1 a
 left join (select distinct id
            , datestamp format=ddmmyy10.
            , case when whichc("AA", code1, code2, code3) > 0 then "code"
                   when whichc("BB", code1, code2, code3) > 0 then "code"
                   when payment >= 3 then "payment" else . end as position
            from data2
            group by id
            having datestamp=min(datestamp)) b
    on a.id = b.id;
        quit;

Any ideas on how I could make this work?

SQL code is prefered since the rest of my code is in that format.

5 REPLIES 5
Kurt_Bremser
Super User

Please test your datalines code in the future, so we do not have to fix it first.

 

See this:

data Data1;
infile datalines dlm="|";
input ID :$2. date :ddmmyy. (code1 code2 code3) ($) payment;
format date ddmmyy10.;
datalines;
01 | 01/01/2020 | AA    | AA    | AA    | 1
01 | 02/01/2020 | AA    | AA    | AA    | 1
01 | 03/01/2020 | AA    | AA    | AA    | 1
01 | 12/02/2020 | AA    | AA    | AA    | 2
01 | 13/02/2020 | BB    | AA    | AA    | 2
01 | 14/01/2020 | BB    | AA    | AA    | 2
01 | 15/01/2020 | CC    | BB    | AA    | 2
02 | 03/01/2020 | .     | .     | .     | 1
02 | 04/01/2020 | AA    | .     | .     | 1
02 | 05/01/2020 | AA    | .     | .     | 1 
02 | 06/01/2020 | CC    | AA    | .     | 1
02 | 07/01/2020 | CC    | AA    | .     | 1
03 | 20/03/2020 | AA    | AA    | .     | 2
03 | 21/03/2020 | AA    | AA    | .     | 3 
;

data data2;
input id :$2. var1;
datalines;
01 1
02 2
03 3
;

data want;
merge
  data2
  data1
;
by id;
retain flag;
length position $7;
if first.id then flag = 1;
if flag
then do;
  if code1 in ('BB','CC') or code2 in ('BB','CC') or code3 in ('BB','CC')
  then do;
    flag = 0;
    position = "code";
  end;
  else if payment ge 3
  then do;
    flag = 0;
    position = "payment";
  end;
  if flag = 0 then output;
end;
keep id var1 date position;
run;
PeterClemmensen
Tourmaline | Level 20

First off, you refer to the data set called 'data1' as both the data that has a single and multiple rows per ID. 


Can you post examples of both data sets?

PeterClemmensen
Tourmaline | Level 20

Also, is it safe to assume that your data is sorted by ID and date?

catkat96
Obsidian | Level 7
Yes, data is sorted by ID and Date.

Sorry for that, fixed that format error, now it should be fine.

I also forgot to add that if it could be a proc sql answer that would be great, cause the rest of my code is all in proc sql
Kurt_Bremser
Super User

Maxim 14: Use the Right Tool.

In this case, it's the data step.

The fact that you seem to not be very versed in data step programming yet does not mean that you should not use it, but that you have to learn it. NOW. It is THE "Swiss Army Knife" of SAS.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 505 views
  • 0 likes
  • 3 in conversation