I have a data set that I have grouped by 'ID'
ID Mark1 Mark2
1 24 76
1 24 76
1 45 67
2 23 56
2 87 32
I want this to be transformed into
ID Mark1First Mark1Last Mark2First Mark2Last
1 24 45 76 67
2 23 87 56 32
With variable renaming and a strategic SET ... POINT= statement, you can have a relatively compact data step:
data have;
input ID Mark1 Mark2;
datalines;
1 24 76
1 24 76
1 45 67
2 23 56
2 87 32
run;
data want;
set have (rename=(mark1=mark1last mark2=mark2last));
by id;
if first.id then set have (rename=(mark1=mark1first mark2=mark2first)) point=_n_;
if last.id;
run;
Editted addition. An alternative that can be instructive about the use of 2 SET statements each with sequential access:
data want;
set have (rename=(mark1=mark1first mark2=mark2first));
by id;
if first.id;
do until (last.id);
set have (rename=(mark1=mark1last mark2=mark2last));
by id;
end;
run;
One way:
data want; set have; by id; retain Mark1First Mark1last Mark2first Mark2last; if first.id then do; Mark1First=mark1; Mark2first=mark2; end; if last.id then do; Mark1last=mark1; Mark2last=mark2; output; end; keep id Mark1First Mark1last Mark2first Mark2last; run;
When you use a BY group in a data step SAS creates automatic variables that indicate whether the record is the first or last for a group. You reference these variables with the First. and Last. notation. These are numeric 1/0 for which can be used as 1=True and 0=False for use in if statements.
The RETAIN statement sets up variables to keep values across iterations of the data step boundary.
The explicit OUTPUT statement limits when records are written to the output data set.
With variable renaming and a strategic SET ... POINT= statement, you can have a relatively compact data step:
data have;
input ID Mark1 Mark2;
datalines;
1 24 76
1 24 76
1 45 67
2 23 56
2 87 32
run;
data want;
set have (rename=(mark1=mark1last mark2=mark2last));
by id;
if first.id then set have (rename=(mark1=mark1first mark2=mark2first)) point=_n_;
if last.id;
run;
Editted addition. An alternative that can be instructive about the use of 2 SET statements each with sequential access:
data want;
set have (rename=(mark1=mark1first mark2=mark2first));
by id;
if first.id;
do until (last.id);
set have (rename=(mark1=mark1last mark2=mark2last));
by id;
end;
run;
hey! the 2nd code worked! I was wondering if I could modify this to have the first populated row of mark1 i.e. if the mark1 column has missing values, I'd like to include the first row where mark1 is not missing in mark1first and also include a count variable that has the number of rows where mark1 is not missing
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.