BookmarkSubscribeRSS Feed
Ronein
Meteorite | Level 14

Hello

Is there a more clever way and shorter code to perform these multiple merges?

Is it possible to perform multiple merge in one step?

data SCORE_TBL_ALL;
INPUT ID MONTH pd;
CARDS;
111 2111 0.69
111 2112 0.91
111 2201 0.92
111 2202 0.89
111 2203 0.93
111 2204 0.88
111 2205 0.87
111 2206 0.91
111 2207 0.77
222 2011 0.87
222 2012 0.43
222 2101 0.54
222 2102 0.65
222 2103 0.43
222 2104 0.77
222 2105 0.91
222 2106 0.57
222 2107 0.63
;
RUN;

DATA T1;
INPUT ID MONTH Month_minus1 Month_minus2 Month_minus3 Month_minus4 Month_minus5 Month_minus6;
cards;
111 2207 2206 2205 2204 2203 2202 2201
222 2105 2104 2103 2102 2101 2010 2009
;
run;


proc sort data=SCORE_TBL_ALL;
by ID month;
Run;
proc sort data=T1;
by ID month;
Run;
Data t2_0;
Merge t1(in=a)
      SCORE_TBL_ALL(in=b rename=(pd=pd_base));
by ID Month;
IF a;
Run;
Data t2_1;
Merge t2_0(in=a)
      SCORE_TBL_ALL(in=b rename=(month= Month_minus1  pd=pd_minus1 ));
by ID Month_minus1;
IF a;
Run;
 Data t2_2;
Merge t2_1(in=a)
      SCORE_TBL_ALL(in=b rename=(month= Month_minus2  pd=pd_minus2 ));
by ID Month_minus2;
IF a;
Run;
 Data t2_3;
Merge t2_2(in=a)
      SCORE_TBL_ALL(in=b rename=(month= Month_minus3  pd=pd_minus3 ));
by ID Month_minus3;
IF a;
Run;
Data t2_4;
Merge t2_3(in=a)
      SCORE_TBL_ALL(in=b rename=(month= Month_minus4  pd=pd_minus4 ));
by ID Month_minus4;
IF a;
Run;
 Data t2_5;
Merge t2_4(in=a)
      SCORE_TBL_ALL(in=b rename=(month= Month_minus5  pd=pd_minus5 ));
by ID Month_minus5;
IF a;
Run;
 Data t2_6;
Merge t2_5(in=a)
      SCORE_TBL_ALL(in=b rename=(month= Month_minus6  pd=pd_minus6 ));
by ID Month_minus6;
IF a;
Run;
13 REPLIES 13
PeterClemmensen
Tourmaline | Level 20

What does your desired result look like given your two posted data sets?

 

Probably easier to help you that way.

Ronein
Meteorite | Level 14

The desire is to add for each customer ID  information of wealth in 6 months prior to the date in field "month"

 

andreas_lds
Jade | Level 19

Proper dates would make this easier.

 

With your current data:

proc transpose data=work.t1 name=source out=work.transposed(rename=(col1=month));
   by ID;
   var Month:;
run;

proc sort data=work.transposed out=work.sorted;
   by id month;
run;

data work.want;
   merge work.score_tbl_all work.sorted;
   by id month;

   if not missing(source);

   source = propcase(source);
run;

Not the expected shape, but long tables are almost always better to work with.

ballardw
Super User

@andreas_lds wrote:

Proper dates would make this easier.

 

With your current data:

proc transpose data=work.t1 name=source out=work.transposed(rename=(col1=month));
   by ID;
   var Month:;
run;

proc sort data=work.transposed out=work.sorted;
   by id month;
run;

data work.want;
   merge work.score_tbl_all work.sorted;
   by id month;

   if not missing(source);

   source = propcase(source);
run;

Not the expected shape, but long tables are almost always better to work with.


@andreas_lds 

I think we have collectively been telling the OP about proper dates for more than 4 years now to use dates. As evidence I provide this link to an earlier thread from March 2018

https://communities.sas.com/t5/SAS-Programming/Define-the-following-parameter-of-dates-YYMM/m-p/4432...

 

 

Kurt_Bremser
Super User

Never (as in NEVER) store date-related values like this. Not even when hell freezes over so hard that the brimstone turns superconducting.

Always use SAS date values, so you can make use of the gazillion of tools which SAS provides for dates.

So read your data like this:

data SCORE_TBL_ALL;
input ID MONTH :yymmn4. pd;
format month yymmn6.;
cards;
111 2111 0.69
111 2112 0.91
111 2201 0.92
111 2202 0.89
111 2203 0.93
111 2204 0.88
111 2205 0.87
111 2206 0.91
111 2207 0.77
222 2011 0.87
222 2012 0.43
222 2101 0.54
222 2102 0.65
222 2103 0.43
222 2104 0.77
222 2105 0.91
222 2106 0.57
222 2107 0.63
;

data T1;
input ID MONTH :yymmn4.;
format month yymmn6.;
cards;
111 2207
222 2105
;

As you can see, I avoided the multiple month columns, as they can be derived from month by calculation:

data want;
merge
  score_tbl_all (in=sc)
  t1 (
    in=t1
    rename=(month=mt)
  )
;
by id;
if first.id
then m = 0;
if sc and t1;
if intnx('month',mt,-5) le month le mt;
m + 1;
drop mt;
run;

From this long result dataset, a wide report is easily created with PROC REPORT:

proc report data=want;
column id m,(month pd);
define id / group;
define month / "" display;
define pd / "" analysis;
define m / "" across;
run;

One simply does not want wide datasets for processing, unless you need to do a regression.

 

Ronein
Meteorite | Level 14
Thanks,
I know you hate putting numeric SAS date and I agree.
The reason is that ususally the input data sets that I am using are in such bad format.
May I ask please:
Did you perform Ful join or Inner Join?
I saw that you wrote "if sc and t1;" - Can it be written before the first.ID or it must be written after?
What is the meaning of "if intnx('month',mt,-5) le month le mt;" - IS it same like writting
if intnx('month',mt,-5) le month le mt then output ?


Kurt_Bremser
Super User

@Ronein wrote:
Thanks,
I know you hate putting numeric SAS date and I agree.
The reason is that ususally the input data sets that I am using are in such bad format.


So you convert the values to SAS date and time values 3 years before yesterday. The fact that someone else is a brain-amputated moron does not mean you have to be one, too.


Did you perform Ful join or Inner Join?
I saw that you wrote "if sc and t1;" - Can it be written before the first.ID or it must be written after?

Play around with the code and see what happens. SAS is an interpreting language, so a test is always only a mouse-click waway (also see Maxim 4).


I saw that you wrote "if sc and t1;" - Can it be written before the first.ID or it must be written after?
What is the meaning of "if intnx('month',mt,-5) le month le mt;" - IS it same like writting
if intnx('month',mt,-5) le month le mt then output ?

Consult the documentation on the difference between a "normal" and a subsetting IF.

Ronein
Meteorite | Level 14

Thank you.

I run your code step by step in order to fully understand it and I like this code.

I have some questions please:

Question 1:

When I deleted the statement "if sc and t1;"   the result was same.

I still want to ask what is the meaning of that sentence?  Is it essential here?

Question 2:

You wrote  "if intnx('month',mt,-5) le month le mt;"

As I understand it means that  :  month<=mt    AND   intnx('month',mt,-5) <= month

But as I checked it should be :month<=mt    AND   intnx('month',mt,-5) >= month

 

Ronein_0-1660023445435.png

 

Ronein
Meteorite | Level 14

This code is working very well  thanks to your feedback:

data SCORE_TBL_ALL;
input ID MONTH :yymmn4. pd;
format month yymmn6.;
cards;
111 2111 0.69
111 2112 0.91
111 2201 0.92
111 2202 0.89
111 2203 0.93
111 2204 0.88
111 2205 0.87
111 2206 0.91
111 2207 0.77
222 2011 0.87
222 2012 0.43
222 2101 0.54
222 2102 0.65
222 2103 0.43
222 2104 0.77
222 2105 0.91
222 2106 0.57
222 2107 0.63
;
Run;


data T1;
input ID MONTH :yymmn4.;
format month yymmn6.;
cards;
111 2207
222 2105
;
Run;

proc sort data=SCORE_TBL_ALL;
by ID month;
Run;

proc sort data=T1;
by ID;
Run;


data want_Long(Where=(Dif in (0,-1,-2,-3,-4,-5,-6)));
merge  score_tbl_all (in=sc)
        t1 (in=t1 rename=(month=mt));
by id;
Dif=intck('month',mt,month);
Dif_Desc=CATX('_','Minus',abs(Dif));
Run;

proc transpose data=want_Long out=want_Wide(Drop=_name_) prefix=PD_;
by ID ;
id Dif_Desc;
var PD;
run;
proc transpose data=want_Long out=want_Wide2(Drop=_name_) prefix=PD_;
by ID ;
id month;
var PD;
run;

Kurt_Bremser
Super User

@Ronein wrote:

Thank you.

I run your code step by step in order to fully understand it and I like this code.

I have some questions please:

Question 1:

When I deleted the statement "if sc and t1;"   the result was same.

I still want to ask what is the meaning of that sentence?  Is it essential here?

Question 2:

You wrote  "if intnx('month',mt,-5) le month le mt;"

As I understand it means that  :  month<=mt    AND   intnx('month',mt,-5) <= month

But as I checked it should be :month<=mt    AND   intnx('month',mt,-5) >= month

 

Ronein_0-1660023445435.png

 


1) The fact that removing the condition does not change anything means that you have only matches in your datasets; there is no ID which is present in only one dataset

2) intnx('month',mt,-5) >= month implies that means is smaller than mt, so the first condition would also be true, and we would only take months before the intended timespan, instead of months within the timespan

Ronein
Meteorite | Level 14
Thanks,
Why did you write "if sc and t1;" after the statement "If.First" ?
As I see you performed inner join .
Usually when do inner join I write it ("if sc and t1;") after "by"

Kurt_Bremser
Super User

@Ronein wrote:
Thanks,
Why did you write "if sc and t1;" after the statement "If.First" ?
As I see you performed inner join .
Usually when do inner join I write it ("if sc and t1;") after "by"


I need to put the IF FIRST. condition before the subsetting IF, so the code reliably resets the variable to zero even in cases where there is no match.

And be always careful with words like "inner join". A data step MERGE behaves not the same as a SQL JOIN.

Ronein
Meteorite | Level 14
May you change the raw data and show why is it essential to write " if sc and t1" after "if first" ?

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 13 replies
  • 1645 views
  • 1 like
  • 5 in conversation