DATA LAB1;
INPUT OBS DAT TIM X Y Z ;
DATALINES;
1 1111 1001 . . .
2 1111 1002 4 . .
3. 1111 1001 . 2 .
4. 1111 1001 . . 3
RUN;
PROC SORT DATA=LAB1 NODUPKEY;BY DAT; RUN;
PROC PRINT DATA=LAB1;RUN;
So I have data something like given in the DATALINES now I have a SORT which does NODUPKEY on
say DAT variable and then pick only first record and discard 2,3,4 as they are duplicates..so
output is as below:
Obs OBS DAT TIM X Y Z
1 1 1111 1001 . . .
This is ok however now what happens is I have a requirement where out of these duplicates
if Variable "X" contains value this should be the record which should be picked so I used
below code :
DATA LAB1;
INPUT OBS DAT TIM X Y Z ;
DATALINES;
1 1111 1001 . . .
2 1111 1002 4 . .
3. 1111 1001 . 2 .
4. 1111 1001 . . 3
RUN;
PROC SORT DATA=LAB1 NODUPKEY;WHERE X <> . ;BY DAT; RUN;
Which gives me the record "2" as below:
Obs OBS DAT TIM X Y Z
1 2 1111 1002 4 . .
Now with this WHERE statement issue comes if all values for Variable X are missing I require that
SORT behave like normal NODUPKEY and gives me record "1" as in example 1 above.
Any thoughts please?
DATA LAB1;
INPUT OBS DAT TIM X Y Z ;
DATALINES;
1 1111 1001 . . .
2 1111 1002 4 . .
3 1111 1001 . 2 .
4 1111 1001 . . 3
1 11112 1001 . . .
2 11112 1002 . . .
3 11112 1001 . 2 .
4 11112 1001 . . 3
;
RUN;
data want;
idx=1;
do i=1 by 1 until(last.DAT);
set LAB1;
by DAT;
if not missing(X) and not found then do;
idx=i;found=1;
end;
end;
do i=1 by 1 until(last.DAT);
set LAB1;
by DAT;
if idx=i then output;
end;
drop i found idx;
run;
proc dort data=lab1;
by DAT X;
run;
data lab1;
set lab1;
by DAT;
if last.DAT;
run;
The sort will make sure that a non-missing value in X will be last in every DAT Group, and the data step selects that one.
DATA LAB1;
INPUT OBS DAT TIM X Y Z ;
DATALINES;
1 1111 1001 . . .
2 1111 1002 4 . .
3 1111 1001 . 2 .
4 1111 1001 . . 3
1 11112 1001 . . .
2 11112 1002 . . .
3 11112 1001 . 2 .
4 11112 1001 . . 3
;
RUN;
data want;
idx=1;
do i=1 by 1 until(last.DAT);
set LAB1;
by DAT;
if not missing(X) and not found then do;
idx=i;found=1;
end;
end;
do i=1 by 1 until(last.DAT);
set LAB1;
by DAT;
if idx=i then output;
end;
drop i found idx;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.