BookmarkSubscribeRSS Feed
KevinC_
Fluorite | Level 6
cell_code dropcode mail_date seed
----------------------------------------------------------------------
NR800 1 12/07/2009 N
NR800 1 12/07/2009 Y
NR801 1 12/07/2009 N
NR801 1 12/07/2009 Y
NR801 2 12/07/2009 N


Hello everyone,

I have posted the first few rows of a sas dataset. Is there a way I can 'concatenate' rows based on the cell_code?
For example, the first two rows have the same cell_code so the output would look like the following:

NR800 1 12/07/2009 N NR800 1 12/07/2009 Y



Thank you so much for any input !

Kevin
16 REPLIES 16
KevinC_
Fluorite | Level 6
Let me be more clear.. The first two rows before concatenation:

NR800 1 12/07/2009 N
NR800 1 12/07/2009 Y

After:

NR800 1 12/07/2009 N NR800 1 12/07/2009 Y


Thanks.
abdullala
Calcite | Level 5
this is a quite unique request that all fields need to be replicated. I can not think of any one-step way to achieve that.

1. sort the data and assign a unique ID to each records within a cell_code.
2. output to 2 separate datasets with this ID and change variable names.
3. merge the 2 new datasets using proc sql, with value of original cell_code.

or if you need to really 'concatenate' those fields, you may first concatenate all 4 fields first and then do transpose to this one concatenated field. if this is what you wanted then it's an easier way out.
Flip
Fluorite | Level 6
I think before you can get an exact answer we would need to know a few things. Are you creating a new data set, or producing output?
Is there a limit on how many rows you might have to transpose?
Doc_Duke
Rhodochrosite | Level 12
PROC TRANSPOSE will do approximately that (it won't repeat the cell code).

If you want to concatenate the raw data that way, you can do it in the DATA step. Look at the @@ notation for the PUT statement and add something like

FILE myoutputfile; *that you have defined;
IF lag(cell_code) = cell_code THEN PUT _infile_ @@;
ELSE PUT;

Doc Muhlbaier
Duke
Cynthia_sas
SAS Super FREQ
Hi:
I'm curious how you'd envision the new variables being named???
You currently have this:
[pre]
cell_code dropcode mail_date seed
NR800 1 12/07/2009 N
NR800 1 12/07/2009 Y
[/pre]

what would the new variables be named???
[pre]
cell_code dropcode mail_date seed cell2 drop2 mail2 seed2
NR800 1 12/07/2009 N NR800 1 12/07/2009 Y
[/pre]

I'm curious also about why you want cell_code repeated, since that is the value you are using to concatenate???

cynthia
KevinC_
Fluorite | Level 6
Thank you everyone for your input.

Cynthia,
Actually I dont need the cell_code repeated. This is what I need in the new dataset:

cell_code, dropcode, mail_date, seed

sorry about the confusion.

Thanks again.
KevinC_
Fluorite | Level 6
oops.. let me correct myself...

This is what I need in the new dataset:

cell_code, dropcode, mail_date, seed1, seed2
KevinC_
Fluorite | Level 6
Before:

NR800 1 12/07/2009 N
NR800 1 12/07/2009 Y

After:

NR800 1 12/07/2009 N Y
sbb
Lapis Lazuli | Level 10 sbb
Lapis Lazuli | Level 10
As was communicated previously, use PROC TRANSPOSE. Suggest you review the SAS DOC on this procedure and also consider searching the SAS support http://support.sas.com/ website for supplemental technical reference material on the topic.

Scott Barry
SBBWorks, Inc.
Cynthia_sas
SAS Super FREQ
Hi:
You say that you only want:
cell_code, dropcode, mail_date, seed1 and seed2... but what about the NR801 cell_code???? It has dropcode of 1 and dropcode of 2?? I thought the only determinant for concatenation was cell_code.
[pre]
cell_code dropcode mail_date seed
NR801 1 12/07/2009 N
NR801 1 12/07/2009 Y
NR801 2 12/07/2009 N
[/pre]

How would you envision those records being concatenated????
[pre]
cell_code drop1 drop2 drop3 mail1 mail2 mail3 seed1 seed2 seed3
NR800 1 1 . 12/07/2009 12/07/2009 . N Y
NR801 1 1 2 12/07/2009 12/07/2009 12/07/2009 N Y N
[/pre]

Is it possible for there to be differing mail_date values as well???? Remember that EVERY row in your final dataset will have the same number of COLUMNS -- you can't have 1 row/observation with only 1 dropcode and another row/observation with 3 dropcodes.

cynthia
KevinC_
Fluorite | Level 6
Hi Cynthia,

Thanks for your input. Yes you are correct: there could be different dropcode and maildate as well. Here is a section of data that has different dropcode and mail_date and the proc transpose I have put together. The rresult is close to what i want but not quite.

I can't get the 'seed' to show (either Y or N). in other words, in the first line of the output i would like to see 'N' next to the 421 and 'Y' next to the 2. I tried adding ID=seed and it didn't work. Do you have any suggestions?


proc transpose data=freq_file1 out=kyc;
by cell_code dropcode mail_date;
var count;
run;



INPUT:
cell_code dropcode mail_date seed count cumulative_ frequency
NR807 6 12/08/2009 N 421 3032
NR807 6 12/08/2009 Y 2 3034
NR807 7 12/09/2009 N 500 3534
NR807 8 12/09/2009 N 500 4034
NR807 9 12/09/2009 N 500 4534
NR807 10 12/10/2009 N 500 5034
NR807 11 12/10/2009 N 500 5534
NR807 12 12/10/2009 N 77 5611


OUTPUT:
13 NR807 6 18239 COUNT Frequency Count 421 2
14 NR807 7 18240 COUNT Frequency Count 500
15 NR807 8 18240 COUNT Frequency Count 500
16 NR807 9 18240 COUNT Frequency Count 500
17 NR807 10 18241 COUNT Frequency Count 500
18 NR807 11 18241 COUNT Frequency Count 500
19 NR807 12 18241 COUNT Frequency Count 77
Flip
Fluorite | Level 6
I see a couple of ways to handle this.

data one;
input
cell_code $5. dropcode mail_date mmddyy10. +1 seed $1. count cumulative_frequency ;
cards;
NR807 6 12/08/2009 N 421 3032
NR807 6 12/08/2009 Y 2 3034
NR807 7 12/09/2009 N 500 3534
NR807 8 12/09/2009 N 500 4034
NR807 9 12/09/2009 N 500 4534
NR807 10 12/10/2009 N 500 5034
NR807 11 12/10/2009 N 500 5534
NR807 12 12/10/2009 N 77 5611
run;

proc sort data = one out = two;
by cell_code dropcode mail_date seed;
run;

/* Option 1 */

data output(keep = cell_code dropcode mail_date nseed1 - nseed2 ncount1-ncount2 cumulative_frequency);
set two;
format mail_date date9.;
by cell_code dropcode mail_date seed;
array nseed(2) $;
array ncount(2);
retain nseed1 - nseed2 ncount1-ncount2;
if first.mail_date then do;
do i = 1 to 2;
nseed(i) = '';
ncount(i) = .;
end;
i = 1 ;
end;
nseed(i) = seed;
ncount(i) = count;
if last.mail_date then output;
I + 1;
run;

/* Option 2 */

data three;
set two;
by cell_code dropcode mail_date seed;
if first.dropcode then i = 0;
i+1;
run;

proc transpose data = three out = seeds prefix = seed;
by cell_code dropcode mail_date ;
var seed ;
id i;
run;

proc transpose data = three out = counts prefix = count;
by cell_code dropcode mail_date ;
var count;
id i;
run;

data output2;
merge seeds counts;
format mail_date date9.;
by cell_code dropcode mail_date ;
run;
KevinC_
Fluorite | Level 6
Thank you, Flip!
Thanks to everyone else too!

Flip, one last question: in Option 2, what's the purpose of "i"?
"if first.dropcode then i = 0;"
"id i"
Flip
Fluorite | Level 6
It gives you the number for the column name ie. prefix-i for column seed1 seed2

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 16 replies
  • 5679 views
  • 1 like
  • 8 in conversation