Solved: PROC SORT with NODUPKEY

rkumar23 · Posted 05-30-2016 12:35 AM

DATA LAB1;
    INPUT OBS DAT TIM X Y Z ;
DATALINES;
1    1111     1001      . . .
2    1111     1002      4 . .
3.   1111     1001      . 2 .
4.   1111     1001      . . 3
RUN;
PROC SORT DATA=LAB1 NODUPKEY;BY DAT; RUN;
PROC PRINT DATA=LAB1;RUN;

So I have data something like given in the DATALINES now I have a SORT which does NODUPKEY on
say DAT variable and then pick only first record and discard 2,3,4 as they are duplicates..so
output is as below:
Obs    OBS     DAT     TIM    X    Y    Z

1      1     1111    1001    .    .    .

This is ok however now what happens is I have a requirement where out of these duplicates
if Variable "X" contains value this should be the record which should be picked so I used
below code :
DATA LAB1;
    INPUT OBS DAT TIM X Y Z ;
DATALINES;
1    1111     1001      . . .
2    1111     1002      4 . .
3.   1111     1001      . 2 .
4.   1111     1001      . . 3
RUN;
PROC SORT DATA=LAB1 NODUPKEY;WHERE X <> . ;BY DAT; RUN;

Which gives me the record "2" as below:

Obs    OBS     DAT     TIM    X    Y    Z

1      2     1111    1002    4    .    .

Now with this WHERE statement issue comes if all values for Variable X are missing I require that
SORT behave like normal NODUPKEY and gives me record "1" as in example 1 above.

Any thoughts please?

Ksharp · Posted 05-30-2016 05:21 AM

 DATA LAB1;                                                       
    INPUT  OBS  DAT  TIM  X Y Z  ;                                 
 DATALINES;                                                        
 1    1111     1001      .  .  .                                   
 2    1111     1002      4  .  .                                   
 3   1111     1001      .  2  .                                   
 4   1111     1001      .  .  3 
  1    11112     1001      .  .  .                                   
 2    11112     1002      .  .  .                                   
 3   11112     1001      .  2  .                                   
 4   11112     1001      .  .  3
;                                  
 RUN;    
data want;
idx=1;
 do i=1 by 1 until(last.DAT);
  set LAB1;
  by DAT;
  if not missing(X) and not found then do;
   idx=i;found=1;
  end;
 end;
 
 do i=1 by 1 until(last.DAT);
  set LAB1;
  by DAT;
  if idx=i then output;
 end;
 
drop i found idx;
run;

View solution in original post

LinusH · Posted 05-30-2016 12:43 AM

The correct answer depend on if you have any other variation in your data.
But for the given sample data you could try either:
- pre sort the data by SAT and X descending, then proc sort NODUPKEY bi DAY.
- Use SQL GROUP BY DAY, and do max(x). But it's unclear how you wish to handle the values of the other variables. Is the requirement to keep the whole observation for which X has a value, otherwise the first? What process control the original sort order of the data?

Data never sleeps

Kurt_Bremser · Posted 05-30-2016 01:30 AM

proc dort data=lab1;
by DAT X;
run;

data lab1;
set lab1;
by DAT;
if last.DAT;
run;

The sort will make sure that a non-missing value in X will be last in every DAT Group, and the data step selects that one.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Ksharp · Posted 05-30-2016 05:21 AM

 DATA LAB1;                                                       
    INPUT  OBS  DAT  TIM  X Y Z  ;                                 
 DATALINES;                                                        
 1    1111     1001      .  .  .                                   
 2    1111     1002      4  .  .                                   
 3   1111     1001      .  2  .                                   
 4   1111     1001      .  .  3 
  1    11112     1001      .  .  .                                   
 2    11112     1002      .  .  .                                   
 3   11112     1001      .  2  .                                   
 4   11112     1001      .  .  3
;                                  
 RUN;    
data want;
idx=1;
 do i=1 by 1 until(last.DAT);
  set LAB1;
  by DAT;
  if not missing(X) and not found then do;
   idx=i;found=1;
  end;
 end;
 
 do i=1 by 1 until(last.DAT);
  set LAB1;
  by DAT;
  if idx=i then output;
 end;
 
drop i found idx;
run;

PROC SORT with NODUPKEY

Re: PROC SORT with NODUPKEY

Re: PROC SORT with NODUPKEY

Re: PROC SORT with NODUPKEY

Re: PROC SORT with NODUPKEY

Catch up on SAS Innovate 2026

PROC SORT with NODUPKEY

Re: PROC SORT with NODUPKEY

Re: PROC SORT with NODUPKEY

Re: PROC SORT with NODUPKEY

Re: PROC SORT with NODUPKEY

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away