I have a dataset like this:
data your_dataset;
input id year weight seq;
datalines;
1 2000 12 0
1 2005 14 1
1 2006 23 2
2 2001 12 0
2 2001 45 1
2 2001 42 2
2 2002 75 3
2 2002 45 4
3 2003 97 0
3 2004 90 1
4 2010 56 0
4 2011 38 1
4 2012 91 2
4 2012 31 3
4 2014 16 4
5 2013 46 0
5 2013 46 1
5 2014 55 2
5 2015 64 3
5 2016 26 4
5 2017 36 5
5 2018 87 6
5 2019 46 7
;
run;
This is what I would like to get:
data your_dataset;
input id time0 time1 time2 time3 time4 time5 time6;
datalines;
1 12 14 23 . . . .
2 33 60 . . . . .
3 97 90 . . . . .
4 56 38 61 16 . . .
5 46 55 64 26 36 87 46
;
run;
The rule is:
for people with same id and year, I only want the mean weight of this year. For example, for id2 with year 2001, the value in the desired table is 33, because (12+45+42)/3=33. Due to 2001 is the year that id2 had the first weight record, it is located in the colomn "time0" in the desired table.
First, thankk you for providing the data as working SAS data step code. You make our job much easier, and thus you get a faster and more correct answer. Along those lines, there is no point in attaching data in a Microsoft Office document, as many of us will not download this attachment as it could be a security threat.
The first step is to create a sequence number from the original data. This is done in data set HAVE2. Once that is done, PROC REPORT does what PROC REPORT does.
data have;
input id year weight seq;
datalines;
1 2000 12 0
1 2005 14 1
1 2006 23 2
2 2001 12 0
2 2001 45 1
2 2001 42 2
2 2002 75 3
2 2002 45 4
3 2003 97 0
3 2004 90 1
4 2010 56 0
4 2011 38 1
4 2012 91 2
4 2012 31 3
4 2014 16 4
5 2013 46 0
5 2013 46 1
5 2014 55 2
5 2015 64 3
5 2016 26 4
5 2017 36 5
5 2018 87 6
5 2019 46 7
;
data have2;
set have;
by id year;
if first.id then sequence=0;
if first.year and not first.id then sequence+1;
run;
proc report data=have2;
columns id sequence,weight;
define id/group;
define sequence/across 'Time';
define weight/mean;
run;
A big question is do you actually need a data set? A data set in that "wide" format is hard to use for almost any purpose.
But a Report is actually fairly easy, and easier to understand.
One quick example:
proc tabulate data= your_dataset; class id year; var weight; table id, year*weight*mean ; run;
Personally I find your use of Time0 etc very confusing as shown in your desired set as it completely hides details like for Id=1 there is a multiyear gap between time0 and time1.
There are lots of appearance option changes that can be made to the tabulate code, or alternately Proc Report will do similar.
You would have to convince me that there is a real advantage to that desired data set before actually making one.
Once you have the sequence variable you can use it with the ID statement of PROC TRANSPOSE. Use the PREFIX= option to set the base name of the new variables.
proc transpose data=have out=want(drop=_name_) prefix=YEAR ;
by id;
id sequence ;
var xxx ;
run;
First, thankk you for providing the data as working SAS data step code. You make our job much easier, and thus you get a faster and more correct answer. Along those lines, there is no point in attaching data in a Microsoft Office document, as many of us will not download this attachment as it could be a security threat.
The first step is to create a sequence number from the original data. This is done in data set HAVE2. Once that is done, PROC REPORT does what PROC REPORT does.
data have;
input id year weight seq;
datalines;
1 2000 12 0
1 2005 14 1
1 2006 23 2
2 2001 12 0
2 2001 45 1
2 2001 42 2
2 2002 75 3
2 2002 45 4
3 2003 97 0
3 2004 90 1
4 2010 56 0
4 2011 38 1
4 2012 91 2
4 2012 31 3
4 2014 16 4
5 2013 46 0
5 2013 46 1
5 2014 55 2
5 2015 64 3
5 2016 26 4
5 2017 36 5
5 2018 87 6
5 2019 46 7
;
data have2;
set have;
by id year;
if first.id then sequence=0;
if first.year and not first.id then sequence+1;
run;
proc report data=have2;
columns id sequence,weight;
define id/group;
define sequence/across 'Time';
define weight/mean;
run;
It might be better to number by the number of years since the first year for that ID.
data have2;
set have;
by id year;
if first.id then baseyr = year;
retain baseyr ;
sequence = year - baseyr;
run;
proc report data=have2;
columns id baseyr weight,sequence;
define id/group;
define baseyr / group;
define sequence/across 'Time';
define weight/mean ' ';
run;
Time id baseyr 0 1 2 3 4 5 6 1 2000 12 . . . . 14 23 2 2001 33 60 . . . . . 3 2003 97 90 . . . . . 4 2010 56 38 61 . 16 . . 5 2013 46 55 64 26 36 87 46
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9.
Lock in the best rate now before the price increases on April 1.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.