I am trying to merge 2 datasets. Please provide help with how to do this:
Dataset A
customer Start date
1 1April2018
2 1July2018
Dataset B
Customer 1April 2018 1May2018 June2018 July2018 Aug2018 Sep2018 ...... Mar2022
1 1 2 3 4 5 6 ......
2 - - - 7 8 9 ......
Desired output
Customer Start date M1 M2 M3 M4.......... M24
1 1April2018 1 2 3 4 ........
2 1July2018 7 8 9 .........
If
then you could still use a single data step approach, with minor changes in calculating and using the _OFFSET variable:
data DatasetA;
input customer Startdate : $20.;
cards;
1 1April2018
1 1May2018
1 1June2018
2 1July2018
2 1Aug2018
2 1Sep2018
;
data DatasetB;
input Customer April2018 May2018 June2018 July2018 Aug2018 Sep2018 ;
cards;
1 1 2 3 4 5 6
2 . . . 7 8 9
;
data want (drop=_:);
merge dataseta datasetB;
by customer;
array months {*} april2018--sep2018;
if first.customer then do _offset=0 to dim(months)-1 while(months{_offset+1}=.);
end;
else _offset+1;
array M {6};
if _offset<dim(months) then do _i=1 to dim(m)-_offset;
m{_i}=months{_offset+_i};
end;
drop april2018--sep2018;
run;
data DatasetA;
input customer Startdate : $20.;
cards;
1 1April2018
2 1July2018
;
data DatasetB;
input Customer April2018 May2018 June2018 July2018 Aug2018 Sep2018 ;
cards;
1 1 2 3 4 5 6
2 . . . 7 8 9
;
proc transpose data=DatasetB out=temp;
by Customer;
var April2018--Sep2018 ;
run;
proc transpose data=temp(where=(col1 is not missing)) out=temp1 prefix=M;
by Customer;
var col1 ;
run;
data want;
merge DatasetA temp1(drop=_name_);
by CUstomer;
run;
(remember there are large number of customers and i had taken 2 customers just for illustration)
further, In dataset A, the start date keeps incrementing to cover all dates till the latest month, for example,
Dataset A
Cust Start_date
1 Apr18
1 May18
.
.
1 Mar22
2 ...
and so on
so the desired output needs to be like:
Cust Start_Date M1 M2 M3 ...... M24
1 Apr'18 1 2 3 ....
1 May'18 2 3 4 .....
1 Jun'18
. .
. .
1 Mar'22
If your dataset B always has its first non-missing value in the variable corresponding to startdate in dataset A, then the task is very straightforward:
data DatasetA;
input customer Startdate : $20.;
cards;
1 1April2018
2 1July2018
;
data DatasetB;
input Customer April2018 May2018 June2018 July2018 Aug2018 Sep2018 ;
cards;
1 1 2 3 4 5 6
2 . . . 7 8 9
;
data want (drop=_:);
merge dataseta datasetB;
by customer;
array months {*} april2018--sep2018;
do _offset=0 to 5 while(months{_offset+1}=.);
end;
array M {6};
do _i=1 to dim(m)-_offset;
m{_i}=months{_offset+_i};
end;
drop april2018--sep2018;
run;
Just make sure to have your array declarations consistent. Make the M large enough to consider all of your date variables, size 6 above, but likely size 24 if you really have 24 months.
Also this assumes that all the date variables in dataset B are contiguous.
That is actually not the case.
In dataset A, the start date keeps incrementing, for example,
Dataset A
Cust Start_date
1 Apr18
1 May18
.
.
1 Mar22
2 ...
and so on
so the desired output needs to be like (remember there are large number of customers and i had taken 2 customers just for illustration)
Cust Start_Date M1 M2 M3 ...... M24
1 Apr'18 1 2 3 ....
1 May'18 2 3 4 .....
1 Jun'18
. .
. .
1 Mar'22
If
then you could still use a single data step approach, with minor changes in calculating and using the _OFFSET variable:
data DatasetA;
input customer Startdate : $20.;
cards;
1 1April2018
1 1May2018
1 1June2018
2 1July2018
2 1Aug2018
2 1Sep2018
;
data DatasetB;
input Customer April2018 May2018 June2018 July2018 Aug2018 Sep2018 ;
cards;
1 1 2 3 4 5 6
2 . . . 7 8 9
;
data want (drop=_:);
merge dataseta datasetB;
by customer;
array months {*} april2018--sep2018;
if first.customer then do _offset=0 to dim(months)-1 while(months{_offset+1}=.);
end;
else _offset+1;
array M {6};
if _offset<dim(months) then do _i=1 to dim(m)-_offset;
m{_i}=months{_offset+_i};
end;
drop april2018--sep2018;
run;
I am trying to merge 2 datasets. Please provide help with how to do this:
Dataset A
customer Start date
1 1April2018
2 1July2018
Dataset B
Customer 1April 2018 1May2018 June2018 July2018 Aug2018 Sep2018 ...... Mar2022
1 1 2 3 4 5 6 ......
2 - - - 7 8 9 ......
Desired output
Customer Start date M1 M2 M3 M4.......... M24
1 1April2018 1 2 3 4 ........
1 1May2018 2 3 4 5 ......
1 1June2018 3 4 5 6 ......
.
.
1 Mar'2020 . . . . ............
2 1July2018 7 8 9 .........
2 1Aug'2019 8 9 10 ..................
.
.
.
.
What did you try so far? Are your data in SAS data sets?
B.
That is a completely different problem than the original question.
In this new problem the values from B are REPEATED in the output dataset.
The answer is still the same though.
First get the date values out of the metadata into a actual data by transposing the B dataset and converting the variable name into a date.
Now to get this data with the values from B matched to multiple observations from A you will want to use an SQL join instead of a simple SAS merge.
proc sql;
create table want as
select customer
, a.start
, intck('month',a.start,b.date)+1 as month
, b.date
, b.B
from A a
left join B b
on a.customer = b.customer
and . < a.start <= b.date
order by customer, start, month
;
quit;
You can now use this data to produce your report as an actual report.
Or use proc transpose to create a dataset where the month number is implied by the name of the variable.
First let's convert your posted listings into actual data.
data a ;
input customer start :date. ;
format start date9.;
cards;
1 01Apr2018
2 01Jul2018
;
data b;
input customer Apr2018 May2018 Jun2018 Jul2018 Aug2018 Sep2018 ;
cards;
1 1 2 3 4 5 6
2 . . . 7 8 9
;
You will want to first use PROC TRANSPOSE to convert the B dataset that has data in the metadata into a dataset that has the data stored in actual variables. You will have to convert names into date values since the name of a variable is a text string, not a number.
Then you can merge the two datasets and calculate the MONTH offset between the two dates.
proc transpose data=b out=b_tall(rename=(col1=B));
by customer;
run;
data want;
merge a b_tall;
by customer;
date = input(_name_,anydtdte.);
format date date9.;
month = 1 + intck('month',start,date);
if month < 1 then delete;
run;
Which you could print using PROC REPORT like this:
proc report data=want ;
column customer start B,month;
define customer / group;
define start / group;
define month / across ;
define B / sum ' ' ;
run;
Result:
If you want you can transpose again, but then instead of having date values in the metadata you will have the month offset number instead. Why not just leave the data in the normalized format where it will be easier to work with.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.