- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
DATA LAB1;
INPUT OBS DAT TIM X Y Z ;
DATALINES;
1 1111 1001 . . .
2 1111 1002 4 . .
3. 1111 1001 . 2 .
4. 1111 1001 . . 3
RUN;
PROC SORT DATA=LAB1 NODUPKEY;BY DAT; RUN;
PROC PRINT DATA=LAB1;RUN;
So I have data something like given in the DATALINES now I have a SORT which does NODUPKEY on
say DAT variable and then pick only first record and discard 2,3,4 as they are duplicates..so
output is as below:
Obs OBS DAT TIM X Y Z
1 1 1111 1001 . . .
This is ok however now what happens is I have a requirement where out of these duplicates
if Variable "X" contains value this should be the record which should be picked so I used
below code :
DATA LAB1;
INPUT OBS DAT TIM X Y Z ;
DATALINES;
1 1111 1001 . . .
2 1111 1002 4 . .
3. 1111 1001 . 2 .
4. 1111 1001 . . 3
RUN;
PROC SORT DATA=LAB1 NODUPKEY;WHERE X <> . ;BY DAT; RUN;
Which gives me the record "2" as below:
Obs OBS DAT TIM X Y Z
1 2 1111 1002 4 . .
Now with this WHERE statement issue comes if all values for Variable X are missing I require that
SORT behave like normal NODUPKEY and gives me record "1" as in example 1 above.
Any thoughts please?
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
DATA LAB1;
INPUT OBS DAT TIM X Y Z ;
DATALINES;
1 1111 1001 . . .
2 1111 1002 4 . .
3 1111 1001 . 2 .
4 1111 1001 . . 3
1 11112 1001 . . .
2 11112 1002 . . .
3 11112 1001 . 2 .
4 11112 1001 . . 3
;
RUN;
data want;
idx=1;
do i=1 by 1 until(last.DAT);
set LAB1;
by DAT;
if not missing(X) and not found then do;
idx=i;found=1;
end;
end;
do i=1 by 1 until(last.DAT);
set LAB1;
by DAT;
if idx=i then output;
end;
drop i found idx;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
But for the given sample data you could try either:
- pre sort the data by SAT and X descending, then proc sort NODUPKEY bi DAY.
- Use SQL GROUP BY DAY, and do max(x). But it's unclear how you wish to handle the values of the other variables. Is the requirement to keep the whole observation for which X has a value, otherwise the first? What process control the original sort order of the data?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
proc dort data=lab1;
by DAT X;
run;
data lab1;
set lab1;
by DAT;
if last.DAT;
run;
The sort will make sure that a non-missing value in X will be last in every DAT Group, and the data step selects that one.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
DATA LAB1;
INPUT OBS DAT TIM X Y Z ;
DATALINES;
1 1111 1001 . . .
2 1111 1002 4 . .
3 1111 1001 . 2 .
4 1111 1001 . . 3
1 11112 1001 . . .
2 11112 1002 . . .
3 11112 1001 . 2 .
4 11112 1001 . . 3
;
RUN;
data want;
idx=1;
do i=1 by 1 until(last.DAT);
set LAB1;
by DAT;
if not missing(X) and not found then do;
idx=i;found=1;
end;
end;
do i=1 by 1 until(last.DAT);
set LAB1;
by DAT;
if idx=i then output;
end;
drop i found idx;
run;