I have a dataset as follows:
data have; input ID ID1 Mark1; datalines; 1 1 . 1 1 76 1 1 67 2 2 . 2 2 32
2 2 45 run;
I would like to group by ID and ID1 and extract the first and last non-missing values of mark for each group so that the resultant dataset looks like this:
ID ID1 Mark1First Mark1Last
1 1 76 67
2 2 32 45
data have;
input ID ID1 Mark1 Mark2 $ GetSum;
datalines;
1 1 . N 1
1 1 76 N 2
1 1 67 Y 3
2 2 . N 4
2 2 32 N 5
2 2 45 Y 6
;
run;
data want;
length Mark2First Mark2Last $1.;
call missing(Mark1First,Mark1Last,Mark2First,Mark2Last,Sum_GetSum);
do until(last.id);
set have;
by id ;
Sum_GetSum=sum(GetSum,Sum_GetSum);
if missing(Mark1First) then do;
Mark1First=Mark1;
Mark2First=Mark2;
end;
end;
Mark1Last=Mark1;
Mark2Last=Mark2
;
drop Mark1 Mark2 GetSum;
run;
Please try this:
proc sort data=have; by ID ID2;
run;
data have_1;
set have;
by ID ID1;
retain Mark1First Mark1Last;
retain first last;
if first.ID1 then do;
first=.;
last=.;
Mark1First=.;
Mark1Last=.;
end;
if first=. and Mark1^=. then do;
first=1;
Mark1First=Mark1;
end;
if Mark1^=. then Mark1Last=Mark1;
if last.ID1 then output;
keep ID ID1 Mark1First Mark1Last;
run;
what if there's also a Mark2 and I have to do this for both Mark1 and Mark2?
i.e. Mark1First, Mark2First, Mark1Last, Mark2Last?
Use the WHERE statement:
data have;
input ID ID1 Mark1;
datalines;
1 1 .
1 1 76
1 1 67
2 2 .
2 2 32
2 2 45
run;
data want (drop=mark1);
set have (where=(mark1^=.));
by id id1;
retain mark1first;
if first.id1 then mark1first=mark1;
if last.id1;
mark1last=mark1;
run;
what if there is also a Mark2 and I have to do this for both Mark1 and Mark2?
i.e. Mark1First, Mark2First, Mark1Last, Mark2Last?
data want;
Mark1First=.;
Mark1Last=.;
do until(last.id);
set have;
by id ;
if missing(Mark1First) then Mark1First=Mark1;
end;
Mark1Last=Mark1;
drop Mark1;
run;
what if there's also a Mark2 and I need to do this for both Mark1 and Mark2?
i.e. Mark1First, Mark2First, Mark1Last, Mark2Last?
Try this :
data have;
input ID ID1 Mark1 Mark2;
datalines;
1 1 . .
1 1 76 33
1 1 67 58
2 2 . .
2 2 32 89
2 2 45 11
;
run;
data want;
call missing(Mark1First,Mark1Last,Mark2First,Mark2Last);
do until(last.id);
set have;
by id ;
if missing(Mark1First) then Mark1First=Mark1;
if missing(Mark2First) then Mark2First=Mark2;
end;
Mark1Last=Mark1;
Mark2Last=Mark2
;
drop Mark1 Mark2;
run;
proc print data=want;
run;
Actually my data set looks a little like this:
data have;
input ID ID1 Mark1 Mark2;
datalines;
1 1 . N
1 1 76 N
1 1 67 Y
2 2 . N
2 2 32 N
2 2 45 Y
;
run;
So I need to make sure that Mark2First for ID 1 is the 'N' coressponding to the row with Mark1 as 76 and not the very first 'N' that you see in the column
Try this code. I have changed the First N to X to show the distinction.
data have;
input ID ID1 Mark1 Mark2 $;
datalines;
1 1 . X
1 1 76 N
1 1 67 Y
2 2 . X
2 2 32 N
2 2 45 Y
;
run;
data want;
length Mark2First Mark2Last $1.;
call missing(Mark1First,Mark1Last,Mark2First,Mark2Last);
do until(last.id);
set have;
by id ;
if missing(Mark1First) then do;
Mark1First=Mark1;
Mark2First=Mark2;
end;
end;
Mark1Last=Mark1;
Mark2Last=Mark2
;
drop Mark1 Mark2;
run;
Oh that works! Here's one last update (I promise!)
data have;
input ID ID1 Mark1 Mark2 GetSum;
datalines;
1 1 . N 1
1 1 76 N 2
1 1 67 Y 3
2 2 . N 4
2 2 32 N 5
2 2 45 Y 6
;
run;
I also want to get the sum of ALL values from the getsum column by the group so the output should be 6 and 15 for the 1st and 2nd row and not 5 and 11
There is too much mission creep in this topic.
data have;
input ID ID1 Mark1 Mark2 $ GetSum;
datalines;
1 1 . N 1
1 1 76 N 2
1 1 67 Y 3
2 2 . N 4
2 2 32 N 5
2 2 45 Y 6
;
run;
data want;
length Mark2First Mark2Last $1.;
call missing(Mark1First,Mark1Last,Mark2First,Mark2Last,Sum_GetSum);
do until(last.id);
set have;
by id ;
Sum_GetSum=sum(GetSum,Sum_GetSum);
if missing(Mark1First) then do;
Mark1First=Mark1;
Mark2First=Mark2;
end;
end;
Mark1Last=Mark1;
Mark2Last=Mark2
;
drop Mark1 Mark2 GetSum;
run;
that worked like a charm, thank you!
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.