DATA Step, Macro, Functions and more

Combining two rows into one + create new versions

Reply
Contributor
Posts: 33

Combining two rows into one + create new versions

[ Edited ]

Hi

 

I'm having a trouble with combining two or more rows with the same id into one. I have something like this:

ID | V1 | V2 |... |V10|
1  | str |       |... |        |
1  |       |str2|... |        |

 

But I'd like:

 

ID | V1 | V2|...|V10

1  | str |str2|...|

 

To sum up, I want to combine those two rows into single one like this one. But that's just the beginning. Imagine I have:

 

ID | V1 | V2 |... |V10|
1  | str |       |... |str3 |
1  |       |str2|... |str4 |

 

What I would need is:

 

ID | V1 | V2 |... |V10|V10v2|
1  | str |str2|... |str3|str4     |

 

And if there are more filled rows for a certain column, there should be version 3, 4, 5 and so on.

 

I'd be grateful for you help!

 

Thank you im advance!

Super User
Posts: 24,028

Re: Combining two rows into one + create new versions

Make it long first, so that you have a unique key for each row, which is a combination of ID and V1, V10V2 or however you want to count that. 

 

i.e.:

ID Version Value

1 1 str1

1 2 str2

1 10 str3

1 10 str4

 

Then add a second identifier:

 

ID Version Counter Value

1 1 1 str1

1 2 1 str2

1 10 1 str3

1 10 2 str4

 

Then you can transpose it to a wide format again. 

 

This is the most dynamic approach that I can think of and can be handled with a PROC TRANSPOSE, a DATA STEP to add the COUNTER variable and a second PROC TRANSPOSE to a wide format again. You may want to consider storing the data in a long format overall. 

 

 

Trusted Advisor
Posts: 1,849

Re: Combining two rows into one + create new versions

@Dontik, in your example, when same variable have more than one observation per ID,

as for: ID=1  V10 in (str3 str4) you want output in two variables: v10, v10v2.

 

Is the varaible name important or can it be v10, v11, etc. ?

 

How about input like:

ID | V1 | V2 |... |V10|
1  | str |       |... |str3 |
1  |       |str2|... |str4 |

1  |       |str5|... |str6 |

should the varaibles be:  v1, v2, v2v2, ...,v10, v10v2, v10v3 etc.

OR may I call them: v1, v2, v3, ... v10, v11, v12, ... Vn   

- as much variables as need to include all values per ID ?

- is the order of values important? should it be:

          str, str2, str5, strt3, str4, str6 

  OR   str, str3, str2, str4, str5, str6

 

Can you give some more information what kind of data is it?

It may be a clue to give you a better solution.

Contributor
Posts: 33

Re: Combining two rows into one + create new versions

Yes, variable names are important. V1 etc. is just an example (variable 1), but the names are more specific - they're address types, for example V1 is an address where a certain person lives and V2 is an address of his company. The problem I have is that some records have identical modification date but different values and I'd like to output it as a new version of the same address type. As for the order of values - since the modification dates are the same, I cannot say which version is newer, I just need to provide different variations. So to answer your question - the order of values is not important.
Trusted Advisor
Posts: 1,849

Re: Combining two rows into one + create new versions

Would you like next solution ?

data test;
  input id version address $;
  addr = cat('(',strip(version),') ',strip(address));
  keep id addr;
datalines;
1 1 str1
1 2 str2
1 5 str3
1 5 str4
; run;

proc sort data=test;
  by id addr;
run;

proc transpose data=test 
     out=test1(drop=_name_);
 by id;
 var addr;
run;

within this method you know what address kind is out of its value instead out of its label.

Super User
Posts: 10,860

Re: Combining two rows into one + create new versions

data want;

 update have(obs=0) have;

 by id;

run;

Ask a Question
Discussion stats
  • 5 replies
  • 215 views
  • 0 likes
  • 4 in conversation