BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
zihdonv19
Obsidian | Level 7

I have a dataset like this:

data your_dataset;
    input id year weight seq;
    datalines;
1 2000 12 0
1 2005 14 1
1 2006 23 2
2 2001 12 0
2 2001 45 1
2 2001 42 2
2 2002 75 3
2 2002 45 4
3 2003 97 0
3 2004 90 1
4 2010 56 0
4 2011 38 1
4 2012 91 2
4 2012 31 3
4 2014 16 4
5 2013 46 0
5 2013 46 1
5 2014 55 2
5 2015 64 3
5 2016 26 4
5 2017 36 5
5 2018 87 6
5 2019 46 7
;
run;

This is what I would like to get:

data your_dataset;
    input id time0 time1 time2 time3 time4 time5 time6;
    datalines;
1 12 14 23 . . . .
2 33 60 . . . . .
3 97 90 . . . . .
4 56 38 61 16 . . .
5 46 55 64 26 36 87 46
;
run;

The rule is:

for people with same id and year, I only want the mean weight of this year. For example, for id2 with year 2001, the value in the desired table is 33, because (12+45+42)/3=33. Due to 2001 is the year that id2 had the first weight record, it is located in the colomn "time0" in the desired table.

1 ACCEPTED SOLUTION

Accepted Solutions
PaigeMiller
Diamond | Level 26

First, thankk you for providing the data as working SAS data step code. You make our job much easier, and thus you get a faster and more correct answer. Along those lines, there is no point in attaching data in a Microsoft Office document, as many of us will not download this attachment as it could be a security threat.

 

The first step is to create a sequence number from the original data. This is done in data set HAVE2. Once that is done, PROC REPORT does what PROC REPORT does.

 

data have;
    input id year weight seq;
    datalines;
1 2000 12 0
1 2005 14 1
1 2006 23 2
2 2001 12 0
2 2001 45 1
2 2001 42 2
2 2002 75 3
2 2002 45 4
3 2003 97 0
3 2004 90 1
4 2010 56 0
4 2011 38 1
4 2012 91 2
4 2012 31 3
4 2014 16 4
5 2013 46 0
5 2013 46 1
5 2014 55 2
5 2015 64 3
5 2016 26 4
5 2017 36 5
5 2018 87 6
5 2019 46 7
;

data have2;
    set have;
    by id year;
    if first.id then sequence=0;
    if first.year and not first.id then sequence+1;
run;

proc report data=have2;
    columns id sequence,weight;
    define id/group;
    define sequence/across 'Time';
    define weight/mean;
run;
--
Paige Miller

View solution in original post

6 REPLIES 6
ballardw
Super User

A big question is do you actually need a data set? A data set in that "wide" format is hard to use for almost any purpose.

But a Report is actually fairly easy, and easier to understand.

One quick example:

proc tabulate data= your_dataset;
   class id year;
   var weight;
   table id,
         year*weight*mean
   ;
run;

Personally I find your use of Time0 etc very confusing as shown in your desired set as it completely hides details like for Id=1 there is a multiyear gap between time0 and time1.

 

There are lots of appearance option changes that can be made to the tabulate code, or alternately Proc Report will do similar.

 

You would have to convince me that there is a real advantage to that desired data set before actually making one.

 

zihdonv19
Obsidian | Level 7
Hi!

Thank you for your reply.

The reason that I want a wide format is that I need to use proc traj to do trajectory analysis which requires this kind of dataset. In my real data, I only have data for 5 years, so even using wide format my desired dataset won't look too messy..i guess
Tom
Super User Tom
Super User

Once you have the sequence variable you can use it with the ID statement of PROC TRANSPOSE.  Use the PREFIX= option to set the base name of the new variables.

proc transpose data=have out=want(drop=_name_) prefix=YEAR ;
  by id;
  id sequence ;
  var xxx ;
run;
zihdonv19
Obsidian | Level 7
Thanks for your additional comments. but it gives error "ERROR: The ID value "YEAR0" occurs twice in the same BY group." I guess it's because for some person, they have multiple records in the same year?
PaigeMiller
Diamond | Level 26

First, thankk you for providing the data as working SAS data step code. You make our job much easier, and thus you get a faster and more correct answer. Along those lines, there is no point in attaching data in a Microsoft Office document, as many of us will not download this attachment as it could be a security threat.

 

The first step is to create a sequence number from the original data. This is done in data set HAVE2. Once that is done, PROC REPORT does what PROC REPORT does.

 

data have;
    input id year weight seq;
    datalines;
1 2000 12 0
1 2005 14 1
1 2006 23 2
2 2001 12 0
2 2001 45 1
2 2001 42 2
2 2002 75 3
2 2002 45 4
3 2003 97 0
3 2004 90 1
4 2010 56 0
4 2011 38 1
4 2012 91 2
4 2012 31 3
4 2014 16 4
5 2013 46 0
5 2013 46 1
5 2014 55 2
5 2015 64 3
5 2016 26 4
5 2017 36 5
5 2018 87 6
5 2019 46 7
;

data have2;
    set have;
    by id year;
    if first.id then sequence=0;
    if first.year and not first.id then sequence+1;
run;

proc report data=have2;
    columns id sequence,weight;
    define id/group;
    define sequence/across 'Time';
    define weight/mean;
run;
--
Paige Miller
Tom
Super User Tom
Super User

It might be better to number by the number of years since the first year for that ID.

data have2;
  set have;
  by id year;
  if first.id then baseyr = year;
  retain baseyr ;
  sequence = year - baseyr;
run;

proc report data=have2;
  columns id baseyr weight,sequence;
  define id/group;
  define baseyr / group;
  define sequence/across 'Time';
  define weight/mean ' ';
run;
                                                  Time
id     baseyr          0          1          2          3          4          5          6
 1       2000         12          .          .          .          .         14         23
 2       2001         33         60          .          .          .          .          .
 3       2003         97         90          .          .          .          .          .
 4       2010         56         38         61          .         16          .          .
 5       2013         46         55         64         26         36         87         46

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 695 views
  • 2 likes
  • 4 in conversation