Hello,
I have a table that has multiple rows, and some of the values are missing. I just need to combine them into a single row by year.
I have done it with two data sets. In this case, I only have one data set. What is best way to combine them to a single row?
Thanks team in advance!
Input table:
ID Year Value1 Value2 Value3
1 1999 270
1 1999 350
1 1999
1 2000 20
1 2000 300
1 2000 320
1 2001 122
1 2001
1 2001 500
I need the output table as below:
ID Year Value1 Value2 Value3
1 1999 350 270
1 2000 20 300 320
1 2001 122 500
Editor's note: this topic is very popular. Thanks to @peterz, @Hima, and @AskoLötjönen and others who contributed useful replies. We have consolidated some of these into this single reply so that future readers may benefit.
/* input data */
data input_table;
infile cards missover dsd;
input ID Year Value1 Value2 Value3;
datalines;
1,1999,,270
1,1999,,,350
1,1999,,,
1,2000,20
1,2000,,300
1,2000,,,320
1,2001,,122
1,2001,,,
1,2001,,,500
;
run;
/* Method #1 using PROC MEANS */
proc means data=input_table noprint nway;
class id year;
var value1 value2 value3;
output
out=output_table(drop=_type_ _freq_) max=;
quit;
/* Method #2 using the UPDATE statement */
data want;
update input_table (obs=0) input_table;
by id year;
run;
/* Method 3 using PROC SQL */
proc sql;
create table test2 as
select id,
year,
max(value1) as value1,
max(value2) as value2,
max(value3) as value3
from input_table
group by id, year;
quit;
/* Resulting data set */
Obs ID Year value1 value2 value3
1 1 1999 . 270 350
2 1 2000 20 300 320
3 1 2001 . 122 500
I think you are after:
data
input_table;infile
cards missover dsd;
input
ID Year Value1 Value2 Value3;
cards
;
1,1999,,270
1,1999,,,350
1,1999,,,
1,2000,20
1,2000,,300
1,2000,,,320
1,2001,,122
1,2001,,,
1,2001,,,500
;
proc means data=input_table noprint nway;
class id year;
var value1 value2 value3;
output
out=output_table(drop=_type_ _freq_) max
=;
quit
;
Hmmmm, not used to this editor thing. It seems pretty stupid. It has moved all the semi-colons and added carriage returns in random places but the basic syntax you need is there.
It also doesn't like me Ctrl+Ving stuff into it.
What happened? This thread has been sufficiently discussed and multiple answers have been offered yesterday, now they are all GONE???
The OP post the question twice.
Yes Hai.Kuo, something happened but they are all gone. Here is the code posted by PGStats yesterday. I saved it.
data have;
input ID Year Value1 Value2 Value3;
datalines;
1 1999 . 270 .
1 1999 . . 350
1 1999 . . .
1 2000 20 . .
1 2000 . 300 .
1 2000 . . 320
1 2001 . 122 .
1 2001 . . .
1 2001 . . 500
;
data want;
update have(obs=0) have;
by id year;
run;
Hima: Glad you found it as I was going to repost it myself, and add in the original statement crediting Tom Abernathy for originally suggesting it.
Apparently some people still don't realize that, if they delete a post, they also delete all posts that have replied to their post.
Yes Art. I am totally agreeing with you. PGStats said that the saved the code from Tom. I forgot to add that in my previous post.
Hello Arthur,
How can we get this kind of a result?
O/P
1 1999 (22,30,12) (270,40,14) (32,50,14) - Brackets means a single cell.
2 2000 (18,20,15) (22,33,22) (25,36,320)
input ID Year Value1 Value2 Value3;
datalines;
1 1999 22 270 32
1 1999 30 40 50
1 1999 . 12 14
1 2000 18 22 25
1 2000 20 33 36
1 2000 15 22 320
1 2001 . 122 .
1 2001 . . .
1 2001 . . 500
;
data want;
update have(obs=0) have;
by id year;
run;
Thank you!
aha, the 'wicked' update statement application. Just FYI, after I have studied help document on 'update', it turns out in this case, both (obs=0) and (obs=1) would work.
Regards,
Haikuo
Yes but, if you are dealing with a file containing millions of IDs, why read all of those records if you don't have to?
Hi Roger,
I think that using of proc Sql is the easiest solution:
data test;
input ID @1 year 4-7 value1 11-17 value2 22-28 value3 33-39;
datalines;
1 1999
;
run;
proc sql;
create table test2 as
select id,
year,
max(value1) as value1,
max(value2) as value2,
max(value3) as value3
from test
group by id, year;
quit;
Here is my try...
proc sql;
create table want as
select id,year,
sum(value1) as value1,
sum(value2) as value2,
sum(value3) as value3
from have
group by 1,2;
quit;
-Urvish
Editor's note: this topic is very popular. Thanks to @peterz, @Hima, and @AskoLötjönen and others who contributed useful replies. We have consolidated some of these into this single reply so that future readers may benefit.
/* input data */
data input_table;
infile cards missover dsd;
input ID Year Value1 Value2 Value3;
datalines;
1,1999,,270
1,1999,,,350
1,1999,,,
1,2000,20
1,2000,,300
1,2000,,,320
1,2001,,122
1,2001,,,
1,2001,,,500
;
run;
/* Method #1 using PROC MEANS */
proc means data=input_table noprint nway;
class id year;
var value1 value2 value3;
output
out=output_table(drop=_type_ _freq_) max=;
quit;
/* Method #2 using the UPDATE statement */
data want;
update input_table (obs=0) input_table;
by id year;
run;
/* Method 3 using PROC SQL */
proc sql;
create table test2 as
select id,
year,
max(value1) as value1,
max(value2) as value2,
max(value3) as value3
from input_table
group by id, year;
quit;
/* Resulting data set */
Obs ID Year value1 value2 value3
1 1 1999 . 270 350
2 1 2000 20 300 320
3 1 2001 . 122 500
This is awesome! Thank you!
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.