Desktop productivity for business analysts and programmers

Merging part of rows in dataset

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 10
Accepted Solution

Merging part of rows in dataset

 Hi there! 

 

I am trying to combine two rows into one, taking only some information from both rows. 

 

I have the following hypothetically financial dataset: 

 

Date, BM, Asset_Class, Weight, Return

24-11-2015, BM1, Stocks, 30%, 2%

24-11-2015, BM2, Stocks, 34%, 5%

 

I would like to create only one row, where I take the Date, Asset_Class and the weight from BM1 and the return variable from BM2. The new row will still have to be named named BM1. 

 

Is that possible? 

 

Best, Joe. 


Accepted Solutions
Solution
‎11-26-2015 08:22 AM
Trusted Advisor
Posts: 1,114

Re: Merging part of rows in dataset

Again, assuming that variable Date is the link between the obs. with BM='BM1' and the unique corresponding obs. with BM='BM2', the following code should work:

data have;
infile cards dlm=',';
input Date :ddmmyy10. BM $ Asset_Class $ Weight $ Return $;
format Date ddmmyyd10.;
cards;
23-11-2015, BM1, Stocks, 30%, 2.0%
24-11-2015, BM1, Stocks, 30%, 1.5%
25-11-2015, BM1, Stocks, 30%, 1.2%
23-11-2015, BM2, Stocks, 23%, 3.1%
24-11-2015, BM2, Stocks, 26%, 0.2%
25-11-2015, BM2, Stocks, 23%, 4.5%
23-11-2015, BM1, Bonds, 15%, 3.1%
24-11-2015, BM1, Bonds, 15%, 0.2%
25-11-2015, BM1, Bonds, 15%, 4.5%
23-11-2015, BM2, Bonds, 17%, 2.0%
24-11-2015, BM2, Bonds, 18%, 1.5%
25-11-2015, BM2, Bonds, 17%, 1.2%
;

proc sql;
create table want(drop=Return rename=(NewReturn=Return)) as
select *, case when Asset_Class='Stocks'
               then (select Return from have where Asset_Class='Stocks' & Date=a.Date & BM='BM2')
               else Return
          end as NewReturn
from have a
where BM='BM1';
quit;

View solution in original post


All Replies
Trusted Advisor
Posts: 1,114

Re: Merging part of rows in dataset

[ Edited ]

Here's how you could do this in Base SAS:

data have;
infile cards dlm=',';
input Date :ddmmyy10. BM $ Asset_Class $ Weight $ Return $;
format Date ddmmyyd10.;
cards;
24-11-2015, BM1, Stocks, 30%, 2%
24-11-2015, BM2, Stocks, 34%, 5%
;

proc sql;
create table want as
select *, (select Return from have where Date=a.Date & BM='BM2') as Return_BM2
from have a
where BM='BM1';
quit;

This assumes that variable DATE is the identifier to link associated BM1 and BM2 records. I've read Weight and Return as character variables just for simplification. In practice, of course, these should be numeric variables (cf. PERCENTw.d informat). Furthermore, I assume that Return_BM2 is to be displayed for comparison, not as a replacement of the original value of Return for BM1. If you want to replace the original Return value, you can simply delete the "_BM2" suffix from "Return_BM2" and add a (drop=Return) dataset option after the second "have" in the PROC SQL step.

 

Respected Advisor
Posts: 4,956

Re: Merging part of rows in dataset

[ Edited ]

How reliable is your incoming data?  Do you always have raw data that includes the BM1 record followed by the matching BM2 record?  If that is a reliable pattern, you could code:

 

data want;

infile rawdata dlm=',' dsd truncover;

length dummy $ 1;

informat date ddmmyy10.;

input date asset $ class $ weight $ / dummy dummy dummy dummy return $;

format date ddmmyy10.;

drop dummy;

run;

Respected Advisor
Posts: 3,823

Re: Merging part of rows in dataset

With SAS datasets something like below could work:

data have;
  infile cards dlm=',';
  input Date :ddmmyy10. BM $ Asset_Class $ Weight $ Return $;
  format Date ddmmyyd10.;
  cards;
24-11-2015, BM1, Stocks, 30%, 2%
24-11-2015, BM2, Stocks, 34%, 5%
;

proc sql;
  create table want as
    select 
      l.bm,l.date, l.asset_class, l.weight, r.return
    from  
      have(where=(bm='BM1')) l 
      left join 
      have(where=(bm='BM2')) r
    on l.date=r.date and l.asset_class=r.asset_class
    ;
quit;
Occasional Contributor
Posts: 15

Re: Merging part of rows in dataset

[ Edited ]

Hi,

 

Here's another method. It assumes that you always want the value from the next record.

 

data have;

   input Date : ddmmyy10.

   BM : $3.

   Asset_Class : $10.

   Weight : percent8.

   Return : percent8. ;

format date date9.;

datalines;
24-11-2015 BM1 Stocks 30% 2%

24-11-2015 BM2 Stocks 34% 5% ;

run;

data want;
   set have ;
   if eof1=0 then
      set have(firstobs=2 keep=return rename=(return=leadReturn)) end=eof1;
   else leadReturn = .;
run; 

 

Take a look at this paper from Mark Keintz - http://www.lexjansen.com/nesug/nesug13/113_Final_Paper.pdf

 

- John

Occasional Contributor
Posts: 10

Re: Merging part of rows in dataset

Hi! 

 

Thanks a lot for all the suggestions for solutions, it is really helpfull. I will try them out and get back to you asap. 

 

My biggest concern - and maybe I was not specific enough - is that it is only some of the data that needs to be mended.   

 

 

So my data set looks something like this: 

 

Date,             BM,  Asset_Class, Weight, Return

23-11-2015, BM1, Stocks, 30%, 2,0%

24-11-2015, BM1, Stocks, 30%, 1,5%

25-11-2015, BM1, Stocks, 30%, 1,2%

23-11-2015, BM2, Stocks, 23%, 3,1%

24-11-2015, BM2, Stocks, 26%, 0,2%

25-11-2015, BM2, Stocks, 23%, 4,5%

23-11-2015, BM1, Bonds, 15%, 3,1%

24-11-2015, BM1, Bonds, 15%, 0,2%

 

25-11-2015, BM1, Bonds, 15%, 4,5%

23-11-2015, BM2, Bonds, 17%, 2,0%

24-11-2015, BM2, Bonds, 18%, 1,5%

25-11-2015, BM2, Bonds, 17%, 1,2%

 

So if the asset class equals stocks, then I would like to mend the rows so I use everything but the return from BM1, and only the return coloum from BM2. Afterwards, every row with BM2 should be deleted from the data set. For Bonds (and all other asset classes) I would just like to delete the rows containing BM2 (not mending the return column).

 

Thanks a lot again!

 

- Joe.  

 

Solution
‎11-26-2015 08:22 AM
Trusted Advisor
Posts: 1,114

Re: Merging part of rows in dataset

Again, assuming that variable Date is the link between the obs. with BM='BM1' and the unique corresponding obs. with BM='BM2', the following code should work:

data have;
infile cards dlm=',';
input Date :ddmmyy10. BM $ Asset_Class $ Weight $ Return $;
format Date ddmmyyd10.;
cards;
23-11-2015, BM1, Stocks, 30%, 2.0%
24-11-2015, BM1, Stocks, 30%, 1.5%
25-11-2015, BM1, Stocks, 30%, 1.2%
23-11-2015, BM2, Stocks, 23%, 3.1%
24-11-2015, BM2, Stocks, 26%, 0.2%
25-11-2015, BM2, Stocks, 23%, 4.5%
23-11-2015, BM1, Bonds, 15%, 3.1%
24-11-2015, BM1, Bonds, 15%, 0.2%
25-11-2015, BM1, Bonds, 15%, 4.5%
23-11-2015, BM2, Bonds, 17%, 2.0%
24-11-2015, BM2, Bonds, 18%, 1.5%
25-11-2015, BM2, Bonds, 17%, 1.2%
;

proc sql;
create table want(drop=Return rename=(NewReturn=Return)) as
select *, case when Asset_Class='Stocks'
               then (select Return from have where Asset_Class='Stocks' & Date=a.Date & BM='BM2')
               else Return
          end as NewReturn
from have a
where BM='BM1';
quit;
☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 6 replies
  • 510 views
  • 5 likes
  • 5 in conversation