BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
adjn258
Calcite | Level 5

 

I am trying to merge 2 datasets. Please provide help with how to do this:

  • dataset A contains customer and start date
  • dataset B contains column headings as dates and info for each each customer under each date.
  • Illustration of Dataset A, Dataset B and desired output (I want to pull 24 months information from start date and assign it as M1,M2,M3.....M24)

Dataset A

customer     Start date            

     1              1April2018            

     2              1July2018   

 

Dataset B

Customer      1April 2018     1May2018    June2018   July2018  Aug2018  Sep2018  ...... Mar2022   

   1                      1                      2                3                   4                5             6        ......

   2                      -                      -                 -                    7                8             9         ......

 

Desired output

Customer     Start date          M1      M2     M3     M4..........  M24

     1              1April2018        1          2        3         4  ........

     2              1July2018         7          8        9             .........

 

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

If

  1. The first record in Dataset A for a customer has a startdate that corresponds with the first non-missing value in dataset B, and
  2. The above record is followed by monthly increments - i.e. there are no holes between consecutive dataset A records, and
  3. The date variables in dataset B are contiguous

then you could still use a single data step approach, with minor changes in calculating and using the _OFFSET variable:

 

data DatasetA;
input customer     Startdate  : $20.;       
cards;
     1              1April2018  
     1              1May2018
     1              1June2018 
     2              1July2018
     2              1Aug2018
     2              1Sep2018 
;
 

data DatasetB;
input Customer      April2018     May2018    June2018   July2018  Aug2018  Sep2018 ;
cards;
   1                      1                      2                3                   4                5             6      
   2                      .                     .                .                   7                8             9    
;

data want (drop=_:);
  merge dataseta  datasetB;
  by customer;
  array months {*} april2018--sep2018;
  
  if first.customer then do _offset=0 to dim(months)-1 while(months{_offset+1}=.);
  end;
  else _offset+1;
  
  array M {6};
  if _offset<dim(months) then do _i=1 to dim(m)-_offset;
    m{_i}=months{_offset+_i};
  end;
  drop april2018--sep2018;
run;

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

10 REPLIES 10
Ksharp
Super User
data DatasetA;
input customer     Startdate  : $20.;       
cards;
     1              1April2018            
     2              1July2018   
;
 

data DatasetB;
input Customer      April2018     May2018    June2018   July2018  Aug2018  Sep2018 ;
cards;
   1                      1                      2                3                   4                5             6      
   2                      .                     .                .                   7                8             9    
;

proc transpose data=DatasetB out=temp;
by Customer;
var April2018--Sep2018 ;
run;

proc transpose data=temp(where=(col1 is not missing)) out=temp1 prefix=M;
by Customer;
var col1 ;
run;

data want;
merge DatasetA temp1(drop=_name_);
by CUstomer;
run;
adjn258
Calcite | Level 5

(remember there are large number of customers and i had taken 2 customers just for illustration)

further, In dataset A, the start date keeps incrementing to cover all dates till the latest month, for example,

Dataset A

Cust  Start_date

1       Apr18

1      May18

.

.

1    Mar22

2   ...

and so on

 

so the desired output needs to be like:

 

Cust  Start_Date  M1   M2   M3 ...... M24

1        Apr'18         1      2      3   ....

1       May'18         2      3     4   .....

1       Jun'18

.        .

.        .

1       Mar'22

mkeintz
PROC Star

If your dataset B always has its first non-missing value in the variable corresponding to startdate in dataset A, then the task is very straightforward:

 

data DatasetA;
input customer     Startdate  : $20.;       
cards;
     1              1April2018            
     2              1July2018   
;
 

data DatasetB;
input Customer      April2018     May2018    June2018   July2018  Aug2018  Sep2018 ;
cards;
   1                      1                      2                3                   4                5             6      
   2                      .                     .                .                   7                8             9    
;

data want (drop=_:);
  merge dataseta  datasetB;
  by customer;
  array months {*} april2018--sep2018;
  
  do _offset=0 to 5 while(months{_offset+1}=.);
  end;
  
  array M {6};
  do _i=1 to dim(m)-_offset;
    m{_i}=months{_offset+_i};
  end;
  drop april2018--sep2018;
run;

Just make sure to have your array declarations consistent.   Make the M large enough to consider all of your date variables, size 6 above, but likely size 24 if you really have 24 months.

 

Also this assumes that all the date variables in dataset B are contiguous.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
adjn258
Calcite | Level 5

That is actually not the case. 

 

In dataset A, the start date keeps incrementing, for example,

 

Dataset A

Cust  Start_date

1       Apr18

1      May18

.

.

1    Mar22

2   ...

and so on

 

so the desired output needs to be like (remember there are large number of customers and i had taken 2 customers just for illustration)

 

Cust  Start_Date  M1   M2   M3 ...... M24

1        Apr'18         1      2      3   ....

1       May'18         2      3     4   .....

1       Jun'18

.        .

.        .

1       Mar'22

mkeintz
PROC Star

If

  1. The first record in Dataset A for a customer has a startdate that corresponds with the first non-missing value in dataset B, and
  2. The above record is followed by monthly increments - i.e. there are no holes between consecutive dataset A records, and
  3. The date variables in dataset B are contiguous

then you could still use a single data step approach, with minor changes in calculating and using the _OFFSET variable:

 

data DatasetA;
input customer     Startdate  : $20.;       
cards;
     1              1April2018  
     1              1May2018
     1              1June2018 
     2              1July2018
     2              1Aug2018
     2              1Sep2018 
;
 

data DatasetB;
input Customer      April2018     May2018    June2018   July2018  Aug2018  Sep2018 ;
cards;
   1                      1                      2                3                   4                5             6      
   2                      .                     .                .                   7                8             9    
;

data want (drop=_:);
  merge dataseta  datasetB;
  by customer;
  array months {*} april2018--sep2018;
  
  if first.customer then do _offset=0 to dim(months)-1 while(months{_offset+1}=.);
  end;
  else _offset+1;
  
  array M {6};
  if _offset<dim(months) then do _i=1 to dim(m)-_offset;
    m{_i}=months{_offset+_i};
  end;
  drop april2018--sep2018;
run;

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
adjn258
Calcite | Level 5

I am trying to merge 2 datasets. Please provide help with how to do this:

  • dataset A contains customer and start date
  • dataset B contains column headings as dates and info for each each customer under each date.
  • Illustration of Dataset A, Dataset B and desired output (I want to pull 24 months information from start date (till most recent month) and assign it as M1,M2,M3.....M24)

Dataset A

customer     Start date            

     1              1April2018            

     2              1July2018   

 

Dataset B

Customer      1April 2018     1May2018    June2018   July2018  Aug2018  Sep2018  ...... Mar2022   

   1                      1                      2                3                   4                5             6        ......

   2                      -                      -                 -                    7                8             9         ......

 

Desired output

Customer     Start date          M1      M2     M3     M4..........  M24

     1              1April2018        1          2        3         4  ........

     1              1May2018         2         3        4         5  ......

     1              1June2018        3         4        5         6  ......

    .     

    . 

    1              Mar'2020           .          .       .            . ............

     2              1July2018         7          8        9             .........

     2              1Aug'2019         8          9      10  ..................

     .

     .

     .

     .

 

yabwon
Amethyst | Level 16

What did you try so far? Are your data in SAS data sets?

 

B.

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation



Tom
Super User Tom
Super User

That is a completely different problem than the original question.

In this new problem the values from B are REPEATED in the output dataset.

 

The answer is still the same though.

First get the date values out of the metadata into a actual data by transposing the B dataset and converting the variable name into a date.

 

Now to get this data with the values from B matched to multiple observations from A you will want to use an SQL join instead of a simple SAS merge.

proc sql;
  create table want as 
  select customer
          , a.start 
          , intck('month',a.start,b.date)+1 as month
          , b.date 
          , b.B
   from A a
     left join B b
    on a.customer = b.customer 
    and . < a.start <= b.date
  order by customer, start, month
;
quit;

You can now use this data to produce your report as an actual report. 

Or use proc transpose to create a dataset where the month number is implied by the name of the variable.

Tom
Super User Tom
Super User

First let's convert your posted listings into actual data.

data a ;
  input customer start :date. ;
  format start date9.;
cards;
1 01Apr2018            
2 01Jul2018   
;

data b;
  input customer Apr2018 May2018 Jun2018 Jul2018 Aug2018 Sep2018 ;
cards;
1 1 2 3 4 5 6
2 . . . 7 8 9
;

You will want to first use PROC TRANSPOSE to convert the B dataset that has data in the metadata into a dataset that has the data stored in actual variables.  You will have to convert names into date values since the name of a variable is a text string, not a number.

Then you can merge the two datasets and calculate the MONTH offset between the two dates.

proc transpose data=b out=b_tall(rename=(col1=B));
  by customer;
run;

data want;
  merge a b_tall;
  by customer;
  date = input(_name_,anydtdte.);
  format date date9.;
  month = 1 + intck('month',start,date);
  if month < 1 then delete;
run;

Which you could print using PROC REPORT like this:

proc report data=want ;
  column customer start B,month;
  define customer / group;
  define start / group;
  define month / across ;
  define B / sum ' ' ;
run;

Result:

Screenshot 2022-03-19 133420.jpg

If you want you can transpose again, but then instead of having date values in the metadata you will have the month offset number instead.  Why not just leave the data in the normalized format where it will be easier to work with.

 

adjn258
Calcite | Level 5
This problem was slightly restated: In dataset A, the start date keeps incrementing, for example,
Dataset A

Cust Start_date
1 Apr18
1 May18
.
.
1 Mar22
2 ...
and so on

Dataset B remains the same - it is the master dataset with all info for all dates for all customers

so the desired output needs to be like (remember there are large number of customers and i had taken 2 customers just for illustration)



Cust Start_Date M1 M2 M3 ...... M24

1 Apr'18 1 2 3 ....
1 May'18 2 3 4 .....
1 Jun'18
. .
. .
1 Mar'22 . . .

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 10 replies
  • 2092 views
  • 1 like
  • 5 in conversation