I have a dataset as follows:
data have; input ID ID1 Mark1; datalines; 1 1 . 1 1 76 1 1 67 2 2 . 2 2 32
2 2 45 run;
I would like to group by ID and ID1 and extract the first and last non-missing values of mark for each group so that the resultant dataset looks like this:
ID ID1 Mark1First Mark1Last
1 1 76 67
2 2 32 45
data have;
input ID ID1 Mark1 Mark2 $ GetSum;
datalines;
1 1 . N 1
1 1 76 N 2
1 1 67 Y 3
2 2 . N 4
2 2 32 N 5
2 2 45 Y 6
;
run;
data want;
length Mark2First Mark2Last $1.;
call missing(Mark1First,Mark1Last,Mark2First,Mark2Last,Sum_GetSum);
do until(last.id);
set have;
by id ;
Sum_GetSum=sum(GetSum,Sum_GetSum);
if missing(Mark1First) then do;
Mark1First=Mark1;
Mark2First=Mark2;
end;
end;
Mark1Last=Mark1;
Mark2Last=Mark2
;
drop Mark1 Mark2 GetSum;
run;
Please try this:
proc sort data=have; by ID ID2;
run;
data have_1;
set have;
by ID ID1;
retain Mark1First Mark1Last;
retain first last;
if first.ID1 then do;
first=.;
last=.;
Mark1First=.;
Mark1Last=.;
end;
if first=. and Mark1^=. then do;
first=1;
Mark1First=Mark1;
end;
if Mark1^=. then Mark1Last=Mark1;
if last.ID1 then output;
keep ID ID1 Mark1First Mark1Last;
run;
what if there's also a Mark2 and I have to do this for both Mark1 and Mark2?
i.e. Mark1First, Mark2First, Mark1Last, Mark2Last?
Use the WHERE statement:
data have;
input ID ID1 Mark1;
datalines;
1 1 .
1 1 76
1 1 67
2 2 .
2 2 32
2 2 45
run;
data want (drop=mark1);
set have (where=(mark1^=.));
by id id1;
retain mark1first;
if first.id1 then mark1first=mark1;
if last.id1;
mark1last=mark1;
run;
what if there is also a Mark2 and I have to do this for both Mark1 and Mark2?
i.e. Mark1First, Mark2First, Mark1Last, Mark2Last?
data want;
Mark1First=.;
Mark1Last=.;
do until(last.id);
set have;
by id ;
if missing(Mark1First) then Mark1First=Mark1;
end;
Mark1Last=Mark1;
drop Mark1;
run;
what if there's also a Mark2 and I need to do this for both Mark1 and Mark2?
i.e. Mark1First, Mark2First, Mark1Last, Mark2Last?
Try this :
data have;
input ID ID1 Mark1 Mark2;
datalines;
1 1 . .
1 1 76 33
1 1 67 58
2 2 . .
2 2 32 89
2 2 45 11
;
run;
data want;
call missing(Mark1First,Mark1Last,Mark2First,Mark2Last);
do until(last.id);
set have;
by id ;
if missing(Mark1First) then Mark1First=Mark1;
if missing(Mark2First) then Mark2First=Mark2;
end;
Mark1Last=Mark1;
Mark2Last=Mark2
;
drop Mark1 Mark2;
run;
proc print data=want;
run;
Actually my data set looks a little like this:
data have;
input ID ID1 Mark1 Mark2;
datalines;
1 1 . N
1 1 76 N
1 1 67 Y
2 2 . N
2 2 32 N
2 2 45 Y
;
run;
So I need to make sure that Mark2First for ID 1 is the 'N' coressponding to the row with Mark1 as 76 and not the very first 'N' that you see in the column
Try this code. I have changed the First N to X to show the distinction.
data have;
input ID ID1 Mark1 Mark2 $;
datalines;
1 1 . X
1 1 76 N
1 1 67 Y
2 2 . X
2 2 32 N
2 2 45 Y
;
run;
data want;
length Mark2First Mark2Last $1.;
call missing(Mark1First,Mark1Last,Mark2First,Mark2Last);
do until(last.id);
set have;
by id ;
if missing(Mark1First) then do;
Mark1First=Mark1;
Mark2First=Mark2;
end;
end;
Mark1Last=Mark1;
Mark2Last=Mark2
;
drop Mark1 Mark2;
run;
Oh that works! Here's one last update (I promise!)
data have;
input ID ID1 Mark1 Mark2 GetSum;
datalines;
1 1 . N 1
1 1 76 N 2
1 1 67 Y 3
2 2 . N 4
2 2 32 N 5
2 2 45 Y 6
;
run;
I also want to get the sum of ALL values from the getsum column by the group so the output should be 6 and 15 for the 1st and 2nd row and not 5 and 11
There is too much mission creep in this topic.
data have;
input ID ID1 Mark1 Mark2 $ GetSum;
datalines;
1 1 . N 1
1 1 76 N 2
1 1 67 Y 3
2 2 . N 4
2 2 32 N 5
2 2 45 Y 6
;
run;
data want;
length Mark2First Mark2Last $1.;
call missing(Mark1First,Mark1Last,Mark2First,Mark2Last,Sum_GetSum);
do until(last.id);
set have;
by id ;
Sum_GetSum=sum(GetSum,Sum_GetSum);
if missing(Mark1First) then do;
Mark1First=Mark1;
Mark2First=Mark2;
end;
end;
Mark1Last=Mark1;
Mark2Last=Mark2
;
drop Mark1 Mark2 GetSum;
run;
that worked like a charm, thank you!
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.