Re: Concatenate SAS data rows ?

KevinC_ · Posted 11-25-2009 10:34 AM

cell_code dropcode mail_date seed
----------------------------------------------------------------------
NR800 1 12/07/2009 N
NR800 1 12/07/2009 Y
NR801 1 12/07/2009 N
NR801 1 12/07/2009 Y
NR801 2 12/07/2009 N

Hello everyone,

I have posted the first few rows of a sas dataset. Is there a way I can 'concatenate' rows based on the cell_code?
For example, the first two rows have the same cell_code so the output would look like the following:

NR800 1 12/07/2009 N NR800 1 12/07/2009 Y

Thank you so much for any input !

Kevin

KevinC_ · Posted 11-25-2009 10:37 AM

Let me be more clear.. The first two rows before concatenation:

NR800 1 12/07/2009 N
NR800 1 12/07/2009 Y

After:

NR800 1 12/07/2009 N NR800 1 12/07/2009 Y

Thanks.

abdullala · Posted 11-25-2009 11:17 AM

this is a quite unique request that all fields need to be replicated. I can not think of any one-step way to achieve that.

1. sort the data and assign a unique ID to each records within a cell_code.
2. output to 2 separate datasets with this ID and change variable names.
3. merge the 2 new datasets using proc sql, with value of original cell_code.

or if you need to really 'concatenate' those fields, you may first concatenate all 4 fields first and then do transpose to this one concatenated field. if this is what you wanted then it's an easier way out.

Flip · Posted 11-25-2009 12:08 PM

I think before you can get an exact answer we would need to know a few things. Are you creating a new data set, or producing output?
Is there a limit on how many rows you might have to transpose?

Doc_Duke · Posted 11-25-2009 10:46 AM

PROC TRANSPOSE will do approximately that (it won't repeat the cell code).

If you want to concatenate the raw data that way, you can do it in the DATA step. Look at the @@ notation for the PUT statement and add something like

FILE myoutputfile; *that you have defined;
IF lag(cell_code) = cell_code THEN PUT _infile_ @@;
ELSE PUT;

Doc Muhlbaier
Duke

Cynthia_sas · Posted 11-25-2009 12:10 PM

Hi:
I'm curious how you'd envision the new variables being named???
You currently have this:
[pre]
cell_code dropcode mail_date seed
NR800 1 12/07/2009 N
NR800 1 12/07/2009 Y
[/pre]

what would the new variables be named???
[pre]
cell_code dropcode mail_date seed cell2 drop2 mail2 seed2
NR800 1 12/07/2009 N NR800 1 12/07/2009 Y
[/pre]

I'm curious also about why you want cell_code repeated, since that is the value you are using to concatenate???

cynthia

KevinC_ · Posted 11-25-2009 01:52 PM

Thank you everyone for your input.

Cynthia,
Actually I dont need the cell_code repeated. This is what I need in the new dataset:

cell_code, dropcode, mail_date, seed

sorry about the confusion.

Thanks again.

KevinC_ · Posted 11-25-2009 01:53 PM

oops.. let me correct myself...

This is what I need in the new dataset:

cell_code, dropcode, mail_date, seed1, seed2

KevinC_ · Posted 11-25-2009 01:56 PM

Before:

NR800 1 12/07/2009 N
NR800 1 12/07/2009 Y

After:

NR800 1 12/07/2009 N Y

sbb · Posted 11-25-2009 01:59 PM

As was communicated previously, use PROC TRANSPOSE. Suggest you review the SAS DOC on this procedure and also consider searching the SAS support http://support.sas.com/ website for supplemental technical reference material on the topic.

Scott Barry
SBBWorks, Inc.

Cynthia_sas · Posted 11-25-2009 03:02 PM

Hi:
You say that you only want:
cell_code, dropcode, mail_date, seed1 and seed2... but what about the NR801 cell_code???? It has dropcode of 1 and dropcode of 2?? I thought the only determinant for concatenation was cell_code.
[pre]
cell_code dropcode mail_date seed
NR801 1 12/07/2009 N
NR801 1 12/07/2009 Y
NR801 2 12/07/2009 N
[/pre]

How would you envision those records being concatenated????
[pre]
cell_code drop1 drop2 drop3 mail1 mail2 mail3 seed1 seed2 seed3
NR800 1 1 . 12/07/2009 12/07/2009 . N Y
NR801 1 1 2 12/07/2009 12/07/2009 12/07/2009 N Y N
[/pre]

Is it possible for there to be differing mail_date values as well???? Remember that EVERY row in your final dataset will have the same number of COLUMNS -- you can't have 1 row/observation with only 1 dropcode and another row/observation with 3 dropcodes.

cynthia

KevinC_ · Posted 11-27-2009 09:47 AM

Hi Cynthia,

Thanks for your input. Yes you are correct: there could be different dropcode and maildate as well. Here is a section of data that has different dropcode and mail_date and the proc transpose I have put together. The rresult is close to what i want but not quite.

I can't get the 'seed' to show (either Y or N). in other words, in the first line of the output i would like to see 'N' next to the 421 and 'Y' next to the 2. I tried adding ID=seed and it didn't work. Do you have any suggestions?

proc transpose data=freq_file1 out=kyc;
by cell_code dropcode mail_date;
var count;
run;

INPUT:
cell_code dropcode mail_date seed count cumulative_ frequency
NR807 6 12/08/2009 N 421 3032
NR807 6 12/08/2009 Y 2 3034
NR807 7 12/09/2009 N 500 3534
NR807 8 12/09/2009 N 500 4034
NR807 9 12/09/2009 N 500 4534
NR807 10 12/10/2009 N 500 5034
NR807 11 12/10/2009 N 500 5534
NR807 12 12/10/2009 N 77 5611

OUTPUT:
13 NR807 6 18239 COUNT Frequency Count 421 2
14 NR807 7 18240 COUNT Frequency Count 500
15 NR807 8 18240 COUNT Frequency Count 500
16 NR807 9 18240 COUNT Frequency Count 500
17 NR807 10 18241 COUNT Frequency Count 500
18 NR807 11 18241 COUNT Frequency Count 500
19 NR807 12 18241 COUNT Frequency Count 77

Flip · Posted 11-27-2009 10:51 AM

I see a couple of ways to handle this.

data one;
input
cell_code $5. dropcode mail_date mmddyy10. +1 seed $1. count cumulative_frequency ;
cards;
NR807 6 12/08/2009 N 421 3032
NR807 6 12/08/2009 Y 2 3034
NR807 7 12/09/2009 N 500 3534
NR807 8 12/09/2009 N 500 4034
NR807 9 12/09/2009 N 500 4534
NR807 10 12/10/2009 N 500 5034
NR807 11 12/10/2009 N 500 5534
NR807 12 12/10/2009 N 77 5611
run;

proc sort data = one out = two;
by cell_code dropcode mail_date seed;
run;

/* Option 1 */

data output(keep = cell_code dropcode mail_date nseed1 - nseed2 ncount1-ncount2 cumulative_frequency);
set two;
format mail_date date9.;
by cell_code dropcode mail_date seed;
array nseed(2) $;
array ncount(2);
retain nseed1 - nseed2 ncount1-ncount2;
if first.mail_date then do;
do i = 1 to 2;
nseed(i) = '';
ncount(i) = .;
end;
i = 1 ;
end;
nseed(i) = seed;
ncount(i) = count;
if last.mail_date then output;
I + 1;
run;

/* Option 2 */

data three;
set two;
by cell_code dropcode mail_date seed;
if first.dropcode then i = 0;
i+1;
run;

proc transpose data = three out = seeds prefix = seed;
by cell_code dropcode mail_date ;
var seed ;
id i;
run;

proc transpose data = three out = counts prefix = count;
by cell_code dropcode mail_date ;
var count;
id i;
run;

data output2;
merge seeds counts;
format mail_date date9.;
by cell_code dropcode mail_date ;
run;

KevinC_ · Posted 11-30-2009 11:44 AM

Thank you, Flip!
Thanks to everyone else too!

Flip, one last question: in Option 2, what's the purpose of "i"?
"if first.dropcode then i = 0;"
"id i"

Flip · Posted 11-30-2009 12:55 PM

It gives you the number for the column name ie. prefix-i for column seed1 seed2

Ready to join fellow brilliant minds for the SAS Hackathon?

Classroom Training Available!