Solved: Merging 2 datasets (with dates as column names in one of the datasets)...

adjn258 · Posted 03-19-2022 05:39 AM

I am trying to merge 2 datasets. Please provide help with how to do this:

dataset A contains customer and start date
dataset B contains column headings as dates and info for each each customer under each date.
Illustration of Dataset A, Dataset B and desired output (I want to pull 24 months information from start date and assign it as M1,M2,M3.....M24)

Dataset A

customer Start date

1 1April2018

2 1July2018

Dataset B

Customer 1April 2018 1May2018 June2018 July2018 Aug2018 Sep2018 ...... Mar2022

1 1 2 3 4 5 6 ......

2 - - - 7 8 9 ......

Desired output

Customer Start date M1 M2 M3 M4.......... M24

1 1April2018 1 2 3 4 ........

2 1July2018 7 8 9 .........

mkeintz · Posted 03-19-2022 02:30 PM

If

The first record in Dataset A for a customer has a startdate that corresponds with the first non-missing value in dataset B, and
The above record is followed by monthly increments - i.e. there are no holes between consecutive dataset A records, and
The date variables in dataset B are contiguous

then you could still use a single data step approach, with minor changes in calculating and using the _OFFSET variable:

data DatasetA;
input customer     Startdate  : $20.;       
cards;
     1              1April2018  
     1              1May2018
     1              1June2018 
     2              1July2018
     2              1Aug2018
     2              1Sep2018 
;
 

data DatasetB;
input Customer      April2018     May2018    June2018   July2018  Aug2018  Sep2018 ;
cards;
   1                      1                      2                3                   4                5             6      
   2                      .                     .                .                   7                8             9    
;

data want (drop=_:);
  merge dataseta  datasetB;
  by customer;
  array months {*} april2018--sep2018;
  
  if first.customer then do _offset=0 to dim(months)-1 while(months{_offset+1}=.);
  end;
  else _offset+1;
  
  array M {6};
  if _offset<dim(months) then do _i=1 to dim(m)-_offset;
    m{_i}=months{_offset+_i};
  end;
  drop april2018--sep2018;
run;

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

Ksharp · Posted 03-19-2022 06:49 AM

data DatasetA;
input customer     Startdate  : $20.;       
cards;
     1              1April2018            
     2              1July2018   
;
 

data DatasetB;
input Customer      April2018     May2018    June2018   July2018  Aug2018  Sep2018 ;
cards;
   1                      1                      2                3                   4                5             6      
   2                      .                     .                .                   7                8             9    
;

proc transpose data=DatasetB out=temp;
by Customer;
var April2018--Sep2018 ;
run;

proc transpose data=temp(where=(col1 is not missing)) out=temp1 prefix=M;
by Customer;
var col1 ;
run;

data want;
merge DatasetA temp1(drop=_name_);
by CUstomer;
run;

adjn258 · Posted 03-19-2022 12:54 PM

(remember there are large number of customers and i had taken 2 customers just for illustration)

further, In dataset A, the start date keeps incrementing to cover all dates till the latest month, for example,

Dataset A

Cust Start_date

1 Apr18

1 May18

.

1 Mar22

2 ...

and so on

so the desired output needs to be like:

Cust Start_Date M1 M2 M3 ...... M24

1 Apr'18 1 2 3 ....

1 May'18 2 3 4 .....

1 Jun'18

. .

1 Mar'22

mkeintz · Posted 03-19-2022 11:25 AM

If your dataset B always has its first non-missing value in the variable corresponding to startdate in dataset A, then the task is very straightforward:

data DatasetA;
input customer     Startdate  : $20.;       
cards;
     1              1April2018            
     2              1July2018   
;
 

data DatasetB;
input Customer      April2018     May2018    June2018   July2018  Aug2018  Sep2018 ;
cards;
   1                      1                      2                3                   4                5             6      
   2                      .                     .                .                   7                8             9    
;

data want (drop=_:);
  merge dataseta  datasetB;
  by customer;
  array months {*} april2018--sep2018;
  
  do _offset=0 to 5 while(months{_offset+1}=.);
  end;
  
  array M {6};
  do _i=1 to dim(m)-_offset;
    m{_i}=months{_offset+_i};
  end;
  drop april2018--sep2018;
run;

Just make sure to have your array declarations consistent. Make the M large enough to consider all of your date variables, size 6 above, but likely size 24 if you really have 24 months.

Also this assumes that all the date variables in dataset B are contiguous.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

adjn258 · Posted 03-19-2022 12:50 PM

That is actually not the case.

In dataset A, the start date keeps incrementing, for example,

Dataset A

Cust Start_date

1 Apr18

1 May18

.

1 Mar22

2 ...

and so on

so the desired output needs to be like (remember there are large number of customers and i had taken 2 customers just for illustration)

Cust Start_Date M1 M2 M3 ...... M24

1 Apr'18 1 2 3 ....

1 May'18 2 3 4 .....

1 Jun'18

. .

1 Mar'22

mkeintz · Posted 03-19-2022 02:30 PM

If

The first record in Dataset A for a customer has a startdate that corresponds with the first non-missing value in dataset B, and
The above record is followed by monthly increments - i.e. there are no holes between consecutive dataset A records, and
The date variables in dataset B are contiguous

then you could still use a single data step approach, with minor changes in calculating and using the _OFFSET variable:

data DatasetA;
input customer     Startdate  : $20.;       
cards;
     1              1April2018  
     1              1May2018
     1              1June2018 
     2              1July2018
     2              1Aug2018
     2              1Sep2018 
;
 

data DatasetB;
input Customer      April2018     May2018    June2018   July2018  Aug2018  Sep2018 ;
cards;
   1                      1                      2                3                   4                5             6      
   2                      .                     .                .                   7                8             9    
;

data want (drop=_:);
  merge dataseta  datasetB;
  by customer;
  array months {*} april2018--sep2018;
  
  if first.customer then do _offset=0 to dim(months)-1 while(months{_offset+1}=.);
  end;
  else _offset+1;
  
  array M {6};
  if _offset<dim(months) then do _i=1 to dim(m)-_offset;
    m{_i}=months{_offset+_i};
  end;
  drop april2018--sep2018;
run;

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

adjn258 · Posted 03-19-2022 02:16 PM

I am trying to merge 2 datasets. Please provide help with how to do this:

dataset A contains customer and start date
dataset B contains column headings as dates and info for each each customer under each date.
Illustration of Dataset A, Dataset B and desired output (I want to pull 24 months information from start date (till most recent month) and assign it as M1,M2,M3.....M24)

Dataset A

customer Start date

1 1April2018

2 1July2018

Dataset B

Customer 1April 2018 1May2018 June2018 July2018 Aug2018 Sep2018 ...... Mar2022

1 1 2 3 4 5 6 ......

2 - - - 7 8 9 ......

Desired output

Customer Start date M1 M2 M3 M4.......... M24

1 1April2018 1 2 3 4 ........

1 1May2018 2 3 4 5 ......

1 1June2018 3 4 5 6 ......

.

1 Mar'2020 . . . . ............

2 1July2018 7 8 9 .........

2 1Aug'2019 8 9 10 ..................

.

yabwon · Posted 03-19-2022 04:14 PM

What did you try so far? Are your data in SAS data sets?

B.

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation

Tom · Posted 03-19-2022 02:58 PM

That is a completely different problem than the original question.

In this new problem the values from B are REPEATED in the output dataset.

The answer is still the same though.

First get the date values out of the metadata into a actual data by transposing the B dataset and converting the variable name into a date.

Now to get this data with the values from B matched to multiple observations from A you will want to use an SQL join instead of a simple SAS merge.

proc sql;
  create table want as 
  select customer
          , a.start 
          , intck('month',a.start,b.date)+1 as month
          , b.date 
          , b.B
   from A a
     left join B b
    on a.customer = b.customer 
    and . < a.start <= b.date
  order by customer, start, month
;
quit;

You can now use this data to produce your report as an actual report.

Or use proc transpose to create a dataset where the month number is implied by the name of the variable.

Tom · Posted 03-19-2022 01:37 PM

First let's convert your posted listings into actual data.

data a ;
  input customer start :date. ;
  format start date9.;
cards;
1 01Apr2018            
2 01Jul2018   
;

data b;
  input customer Apr2018 May2018 Jun2018 Jul2018 Aug2018 Sep2018 ;
cards;
1 1 2 3 4 5 6
2 . . . 7 8 9
;

You will want to first use PROC TRANSPOSE to convert the B dataset that has data in the metadata into a dataset that has the data stored in actual variables. You will have to convert names into date values since the name of a variable is a text string, not a number.

Then you can merge the two datasets and calculate the MONTH offset between the two dates.

proc transpose data=b out=b_tall(rename=(col1=B));
  by customer;
run;

data want;
  merge a b_tall;
  by customer;
  date = input(_name_,anydtdte.);
  format date date9.;
  month = 1 + intck('month',start,date);
  if month < 1 then delete;
run;

Which you could print using PROC REPORT like this:

proc report data=want ;
  column customer start B,month;
  define customer / group;
  define start / group;
  define month / across ;
  define B / sum ' ' ;
run;

Result:

If you want you can transpose again, but then instead of having date values in the metadata you will have the month offset number instead. Why not just leave the data in the normalized format where it will be easier to work with.

adjn258 · Posted 03-19-2022 01:51 PM

This problem was slightly restated: In dataset A, the start date keeps incrementing, for example,
Dataset A

Cust Start_date
1 Apr18
1 May18
.
.
1 Mar22
2 ...
and so on

Dataset B remains the same - it is the master dataset with all info for all dates for all customers

so the desired output needs to be like (remember there are large number of customers and i had taken 2 customers just for illustration)

Cust Start_Date M1 M2 M3 ...... M24

1 Apr'18 1 2 3 ....
1 May'18 2 3 4 .....
1 Jun'18
. .
. .
1 Mar'22 . . .

How to merg datasets with column names as dates. Need help with syntax

Re: How to merg datasets with column names as dates. Need help with syntax

Re: How to merg datasets with column names as dates. Need help with syntax

Re: How to merg datasets with column names as dates. Need help with syntax

Re: How to merg datasets with column names as dates. Need help with syntax

Re: How to merg datasets with column names as dates. Need help with syntax

Re: How to merg datasets with column names as dates. Need help with syntax

Merging 2 datasets (with dates as column names in one of the datasets). Help with programming

Re: Merging 2 datasets (with dates as column names in one of the datasets). Help with programming

Re: How to merg datasets with column names as dates. Need help with syntax

Re: How to merg datasets with column names as dates. Need help with syntax

Re: How to merg datasets with column names as dates. Need help with syntax

SAS Innovate 2025: Call for Content

Classroom Training Available!