DATA LAB1;
INPUT OBS DAT TIM X Y Z ;
DATALINES;
1 1111 1001 . . .
2 1111 1002 4 . .
3. 1111 1001 . 2 .
4. 1111 1001 . . 3
RUN;
PROC SORT DATA=LAB1 NODUPKEY;BY DAT; RUN;
PROC PRINT DATA=LAB1;RUN;
So I have data something like given in the DATALINES now I have a SORT which does NODUPKEY on
say DAT variable and then pick only first record and discard 2,3,4 as they are duplicates..so
output is as below:
Obs OBS DAT TIM X Y Z
1 1 1111 1001 . . .
This is ok however now what happens is I have a requirement where out of these duplicates
if Variable "X" contains value this should be the record which should be picked so I used
below code :
DATA LAB1;
INPUT OBS DAT TIM X Y Z ;
DATALINES;
1 1111 1001 . . .
2 1111 1002 4 . .
3. 1111 1001 . 2 .
4. 1111 1001 . . 3
RUN;
PROC SORT DATA=LAB1 NODUPKEY;WHERE X <> . ;BY DAT; RUN;
Which gives me the record "2" as below:
Obs OBS DAT TIM X Y Z
1 2 1111 1002 4 . .
Now with this WHERE statement issue comes if all values for Variable X are missing I require that
SORT behave like normal NODUPKEY and gives me record "1" as in example 1 above.
Any thoughts please?
DATA LAB1;
INPUT OBS DAT TIM X Y Z ;
DATALINES;
1 1111 1001 . . .
2 1111 1002 4 . .
3 1111 1001 . 2 .
4 1111 1001 . . 3
1 11112 1001 . . .
2 11112 1002 . . .
3 11112 1001 . 2 .
4 11112 1001 . . 3
;
RUN;
data want;
idx=1;
do i=1 by 1 until(last.DAT);
set LAB1;
by DAT;
if not missing(X) and not found then do;
idx=i;found=1;
end;
end;
do i=1 by 1 until(last.DAT);
set LAB1;
by DAT;
if idx=i then output;
end;
drop i found idx;
run;
proc dort data=lab1;
by DAT X;
run;
data lab1;
set lab1;
by DAT;
if last.DAT;
run;
The sort will make sure that a non-missing value in X will be last in every DAT Group, and the data step selects that one.
DATA LAB1;
INPUT OBS DAT TIM X Y Z ;
DATALINES;
1 1111 1001 . . .
2 1111 1002 4 . .
3 1111 1001 . 2 .
4 1111 1001 . . 3
1 11112 1001 . . .
2 11112 1002 . . .
3 11112 1001 . 2 .
4 11112 1001 . . 3
;
RUN;
data want;
idx=1;
do i=1 by 1 until(last.DAT);
set LAB1;
by DAT;
if not missing(X) and not found then do;
idx=i;found=1;
end;
end;
do i=1 by 1 until(last.DAT);
set LAB1;
by DAT;
if idx=i then output;
end;
drop i found idx;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.