DATA Step, Macro, Functions and more

PROC SORT with NODUPKEY

Accepted Solution Solved
Reply
Frequent Contributor
Posts: 78
Accepted Solution

PROC SORT with NODUPKEY


  DATA LAB1;                                                      
    INPUT  OBS  DAT  TIM  X Y Z  ;                                
 DATALINES;                                                       
 1    1111     1001      .  .  .                                  
 2    1111     1002      4  .  .                                  
 3.   1111     1001      .  2  .                                  
 4.   1111     1001      .  .  3                                  
 RUN;                                                             
 PROC SORT DATA=LAB1 NODUPKEY;BY DAT; RUN;          
 PROC PRINT DATA=LAB1;RUN;                                        

So I have data something like given in the DATALINES now I have a SORT which does NODUPKEY on
say DAT variable and then pick only first record and discard 2,3,4 as they are duplicates..so
output is as below:
Obs    OBS     DAT     TIM    X    Y    Z         
                                                  
 1      1     1111    1001    .    .    .         


This is ok however now what happens is I have a requirement where out of these duplicates
if Variable "X" contains value this should be the record which should be picked so I used
below code :
  DATA LAB1;                                                      
    INPUT  OBS  DAT  TIM  X Y Z  ;                                
 DATALINES;                                                       
 1    1111     1001      .  .  .                                  
 2    1111     1002      4  .  .                                  
 3.   1111     1001      .  2  .                                  
 4.   1111     1001      .  .  3                                  
 RUN;                                                             
 PROC SORT DATA=LAB1 NODUPKEY;WHERE X <> . ;BY DAT; RUN; 

Which gives me the record "2" as below:

Obs    OBS     DAT     TIM    X    Y    Z    
                                             
 1      2     1111    1002    4    .    .    


Now with this WHERE statement issue comes if all values for Variable X are missing I require that
SORT behave like normal NODUPKEY and gives me record "1" as in example 1 above.

Any thoughts please?


Accepted Solutions
Solution
‎06-01-2016 12:51 AM
Super User
Posts: 9,691

Re: PROC SORT with NODUPKEY

 DATA LAB1;                                                       
    INPUT  OBS  DAT  TIM  X Y Z  ;                                 
 DATALINES;                                                        
 1    1111     1001      .  .  .                                   
 2    1111     1002      4  .  .                                   
 3   1111     1001      .  2  .                                   
 4   1111     1001      .  .  3 
  1    11112     1001      .  .  .                                   
 2    11112     1002      .  .  .                                   
 3   11112     1001      .  2  .                                   
 4   11112     1001      .  .  3
;                                  
 RUN;    
data want;
idx=1;
 do i=1 by 1 until(last.DAT);
  set LAB1;
  by DAT;
  if not missing(X) and not found then do;
   idx=i;found=1;
  end;
 end;
 
 do i=1 by 1 until(last.DAT);
  set LAB1;
  by DAT;
  if idx=i then output;
 end;
 
drop i found idx;
run;

View solution in original post


All Replies
Super User
Posts: 5,260

Re: PROC SORT with NODUPKEY

The correct answer depend on if you have any other variation in your data.
But for the given sample data you could try either:
- pre sort the data by SAT and X descending, then proc sort NODUPKEY bi DAY.
- Use SQL GROUP BY DAY, and do max(x). But it's unclear how you wish to handle the values of the other variables. Is the requirement to keep the whole observation for which X has a value, otherwise the first? What process control the original sort order of the data?
Data never sleeps
Super User
Posts: 6,982

Re: PROC SORT with NODUPKEY

[ Edited ]
proc dort data=lab1;
by DAT X;
run;

data lab1;
set lab1;
by DAT;
if last.DAT;
run;

The sort will make sure that a non-missing value in X will be last in every DAT Group, and the data step selects that one.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Solution
‎06-01-2016 12:51 AM
Super User
Posts: 9,691

Re: PROC SORT with NODUPKEY

 DATA LAB1;                                                       
    INPUT  OBS  DAT  TIM  X Y Z  ;                                 
 DATALINES;                                                        
 1    1111     1001      .  .  .                                   
 2    1111     1002      4  .  .                                   
 3   1111     1001      .  2  .                                   
 4   1111     1001      .  .  3 
  1    11112     1001      .  .  .                                   
 2    11112     1002      .  .  .                                   
 3   11112     1001      .  2  .                                   
 4   11112     1001      .  .  3
;                                  
 RUN;    
data want;
idx=1;
 do i=1 by 1 until(last.DAT);
  set LAB1;
  by DAT;
  if not missing(X) and not found then do;
   idx=i;found=1;
  end;
 end;
 
 do i=1 by 1 until(last.DAT);
  set LAB1;
  by DAT;
  if idx=i then output;
 end;
 
drop i found idx;
run;
☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 3 replies
  • 313 views
  • 0 likes
  • 4 in conversation