Solved: How to get the value of first and last row grouped by a variable and p...

aalluru · Posted 03-26-2021 01:24 PM

I have a data set that I have grouped by 'ID'

ID Mark1 Mark2

1 24 76

1 45 67

2 23 56

2 87 32

I want this to be transformed into

ID Mark1First Mark1Last Mark2First Mark2Last

1 24 45 76 67

2 23 87 56 32

mkeintz · Posted 03-26-2021 01:38 PM

With variable renaming and a strategic SET ... POINT= statement, you can have a relatively compact data step:

data have;
  input ID Mark1 Mark2;
datalines;
1   24       76
1   24       76
1   45       67
2   23       56
2   87       32
run;
data want;
  set have (rename=(mark1=mark1last mark2=mark2last));
  by id;
  if first.id then set have (rename=(mark1=mark1first mark2=mark2first)) point=_n_;
  if last.id;
run;

Editted addition. An alternative that can be instructive about the use of 2 SET statements each with sequential access:

data want;
  set have (rename=(mark1=mark1first mark2=mark2first));
  by id;
  if first.id;
  do until (last.id);
    set have (rename=(mark1=mark1last mark2=mark2last));
    by id;
  end;
run;

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

ballardw · Posted 03-26-2021 01:33 PM

One way:

data want;
   set have;
   by id;
   retain Mark1First Mark1last Mark2first Mark2last;
   if first.id then do;
      Mark1First=mark1;
      Mark2first=mark2; 
   end;
   if last.id then do;
      Mark1last=mark1;
      Mark2last=mark2; 
      output;
   end;
   keep id Mark1First Mark1last Mark2first Mark2last; 
run;

When you use a BY group in a data step SAS creates automatic variables that indicate whether the record is the first or last for a group. You reference these variables with the First. and Last. notation. These are numeric 1/0 for which can be used as 1=True and 0=False for use in if statements.

The RETAIN statement sets up variables to keep values across iterations of the data step boundary.

The explicit OUTPUT statement limits when records are written to the output data set.

mkeintz · Posted 03-26-2021 01:38 PM

With variable renaming and a strategic SET ... POINT= statement, you can have a relatively compact data step:

data have;
  input ID Mark1 Mark2;
datalines;
1   24       76
1   24       76
1   45       67
2   23       56
2   87       32
run;
data want;
  set have (rename=(mark1=mark1last mark2=mark2last));
  by id;
  if first.id then set have (rename=(mark1=mark1first mark2=mark2first)) point=_n_;
  if last.id;
run;

Editted addition. An alternative that can be instructive about the use of 2 SET statements each with sequential access:

data want;
  set have (rename=(mark1=mark1first mark2=mark2first));
  by id;
  if first.id;
  do until (last.id);
    set have (rename=(mark1=mark1last mark2=mark2last));
    by id;
  end;
run;

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

aalluru · Posted 04-13-2021 09:31 AM

hey! the 2nd code worked! I was wondering if I could modify this to have the first populated row of mark1 i.e. if the mark1 column has missing values, I'd like to include the first row where mark1 is not missing in mark1first and also include a count variable that has the number of rows where mark1 is not missing

How to get the value of first and last row grouped by a variable and put those in separate columns

Re: How to get the value of first and last row grouped by a variable and put those in separate colum

Re: How to get the value of first and last row grouped by a variable and put those in separate colum

Re: How to get the value of first and last row grouped by a variable and put those in separate colum

Re: How to get the value of first and last row grouped by a variable and put those in separate colum

How to get the value of first and last row grouped by a variable and put those in separate columns

Re: How to get the value of first and last row grouped by a variable and put those in separate colum

Re: How to get the value of first and last row grouped by a variable and put those in separate colum

Re: How to get the value of first and last row grouped by a variable and put those in separate colum

Re: How to get the value of first and last row grouped by a variable and put those in separate colum

SAS Innovate 2025: Save the Date

SAS Training: Just a Click Away