turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Programming
- /
- Base SAS Programming
- /
- Concatenate rows by multiple by group variables

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

07-11-2016 03:14 PM

Hello,

I am trying to concatenate a variable (VAR2) based on multiple by group variables (By ID VAR1 TruncVar1) using SAS 9.4. I want to concatenate the cells that have the same TruncVar1 value. I want all the rows with the same TruncVar and ID to have the same Var2Combined value. Can anyone help ?

I want to end up with a new column in my dataset, Var2Combined, below:

ID | Var1 | TruncVar1 | Var2 | Var2Combined |

143566 | 1 | 1 | A | A - B C |

143566 | 1.1 | 1 | B C | A - B C |

143566 | 2 | 2 | X Y | X Y - C D E - A B C |

143566 | 2.01 | 2 | C D E | X Y - C D E - A B C |

143566 | 2.02 | 2 | A B C | X Y - C D E - A B C |

158378 | 1 | 1 | A | A - B C |

158378 | 1.2 | 1 | B C | A - B C |

158378 | 4 | 4 | A B C | A B C - X Y |

158378 | 4.01 | 4 | X Y | A B C - X Y |

Data sample ;

INFILE DATALINES DLM='|';

input ID Var1 TruncVar1 Var2 $;

datalines;

143566|1|1|A

143566|1.1|1|B C

143566|2|2|X Y

143566|2.01|2|C D E

143566|2.02|2|A B C

158378|1|1|A

158378|1.2|1|B C

158378|4|4|A B C

158378|4.01|4|X Y ;

run;

proc print data=sample;

run;

proc sort data=sample ;

by id var1 truncVar1 ;

run;

Data sample ;

format CombinedVar2 $200. ;

set sample ;

by id var1 truncVar1 ;

if first.truncVar1 then CombinedVar2 = Var2 ;

CombinedVar2 = Catx(" - ",CombinedVar2, Var2) ;

run;

proc print data=sample;

run;

Accepted Solutions

Solution

07-11-2016
07:05 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

07-11-2016 06:32 PM

CP2 wrote:

Thanks for your comment. TruncVar1 is numeric, =floor(Var1) . After truncating I want to roll up alpha field (Var2) to have one concatenated value for each truncated value. I have left Var1 in the by group as : by id var1 truncVar1

Leaving VAR1 in the By statement with that order will create separate by groups for 2&2, 2.01&2 and 2.02&2. You should drop Var1 in the by statements to prevent extra grouping.

All Replies

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

07-11-2016 04:16 PM

The trick is to go through the data twice:

data want;

length CombinedVar2 $200 ;

do until (last.truncVar1);

set have;

by id var1 truncVar1

CombinedVar2 = Catx(" - ",CombinedVar2, Var2) ;

end;

do until (last.truncVar1);

set have;

by id var1 truncVar1;

output;

end;

run;

The first loop creates CombinedVar2. Then the second loop reads, the same observations and outputs them.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

07-11-2016 04:58 PM

Thanks for the suggestions but it's not giving me the desired resutls. Any other suggestions? Thanks!

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

07-11-2016 05:07 PM

Two very important suggestions.

(1) Post the log from your program

(2) Describe what you mean by "not giving you the desired results"? Errors? No errors, but not what you hoped to get?

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

07-11-2016 05:17 PM

Apparently there is a need to adjust the values of VAR1 as the grouping in the example has Var1=2, 2.01 and 2.02 treated in the same group. So a more complex BY statement will be required or addition of a temporary variable such as a truncated or rounded version of Var1 (not obvious if Var1 is numeric or character).

And since there is one example provided there may be even more obnoxious possibilities.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

07-11-2016 05:22 PM

Solution

07-11-2016
07:05 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

07-11-2016 06:32 PM

CP2 wrote:

Leaving VAR1 in the By statement with that order will create separate by groups for 2&2, 2.01&2 and 2.02&2. You should drop Var1 in the by statements to prevent extra grouping.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

07-11-2016 05:17 PM

Thank you. CombinedVar2 is identical to Var2. I added a missing semi-colon to your code and changed 'Have' to 'Sample' as the name of my dataset. The code, output, and log follows:

Data sample ;

INFILE DATALINES DLM='|';

input ID Var1 TruncVar1 Var2 $;

datalines;

143566|1|1|A

143566|1.1|1|B C

143566|2|2|X Y

143566|2.01|2|C D E

143566|2.02|2|A B C

158378|1|1|A

158378|1.2|1|B C

158378|4|4|A B C

158378|4.01|4|X Y

;

run;

data want;

length CombinedVar2 $200 ;

do until (last.truncVar1);

set sample;

by id var1 truncVar1 ;

CombinedVar2 = Catx(" - ",CombinedVar2, Var2) ;

end;

do until (last.truncVar1);

set sample;

by id var1 truncVar1;

output;

end;

run;

proc print data=want;

run;

SAS Output

The SAS System |

Obs CombinedVar2 ID Var1 TruncVar1 Var2 1 2 3 4 5 6 7 8 9

A | 143566 | 1.00 | 1 | A |

B C | 143566 | 1.10 | 1 | B C |

X Y | 143566 | 2.00 | 2 | X Y |

C D E | 143566 | 2.01 | 2 | C D E |

A B C | 143566 | 2.02 | 2 | A B C |

A | 158378 | 1.00 | 1 | A |

B C | 158378 | 1.20 | 1 | B C |

A B C | 158378 | 4.00 | 4 | A B C |

X Y | 158378 | 4.01 | 4 | X Y |

Log:

3862 Data sample ;

3863 INFILE DATALINES DLM='|';

3864 input ID Var1 TruncVar1 Var2 $;

3865 datalines;

NOTE: The data set WORK.SAMPLE has 9 observations and 4 variables.

NOTE: DATA statement used (Total process time):

real time 0.05 seconds

cpu time 0.04 seconds

3875 ;

3876 run;

3877 data want;

3878 length CombinedVar2 $200 ;

3879 do until (last.truncVar1);

3880 set sample;

3881 by id var1 truncVar1 ;

3882 CombinedVar2 = Catx(" - ",CombinedVar2, Var2) ;

3883 end;

3884

3885 do until (last.truncVar1);

3886 set sample;

3887 by id var1 truncVar1;

3888 output;

3889 end;

3890 run;

NOTE: There were 9 observations read from the data set WORK.SAMPLE.

NOTE: There were 9 observations read from the data set WORK.SAMPLE.

NOTE: The data set WORK.WANT has 9 observations and 5 variables.

NOTE: DATA statement used (Total process time):

real time 0.08 seconds

cpu time 0.07 seconds

3891

3892 proc print data=want;

3893 run;

NOTE: There were 9 observations read from the data set WORK.WANT.

NOTE: PROCEDURE PRINT used (Total process time):

real time 0.06 seconds

cpu time 0.06 seconds

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

07-11-2016 06:41 PM

Did you try removing VAR1 from both BY statements? That looks like a good suggestion.