BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
rkumar23
Calcite | Level 5


  DATA LAB1;                                                      
    INPUT  OBS  DAT  TIM  X Y Z  ;                                
 DATALINES;                                                       
 1    1111     1001      .  .  .                                  
 2    1111     1002      4  .  .                                  
 3.   1111     1001      .  2  .                                  
 4.   1111     1001      .  .  3                                  
 RUN;                                                             
 PROC SORT DATA=LAB1 NODUPKEY;BY DAT; RUN;          
 PROC PRINT DATA=LAB1;RUN;                                        

So I have data something like given in the DATALINES now I have a SORT which does NODUPKEY on
say DAT variable and then pick only first record and discard 2,3,4 as they are duplicates..so
output is as below:
Obs    OBS     DAT     TIM    X    Y    Z         
                                                  
 1      1     1111    1001    .    .    .         


This is ok however now what happens is I have a requirement where out of these duplicates
if Variable "X" contains value this should be the record which should be picked so I used
below code :
  DATA LAB1;                                                      
    INPUT  OBS  DAT  TIM  X Y Z  ;                                
 DATALINES;                                                       
 1    1111     1001      .  .  .                                  
 2    1111     1002      4  .  .                                  
 3.   1111     1001      .  2  .                                  
 4.   1111     1001      .  .  3                                  
 RUN;                                                             
 PROC SORT DATA=LAB1 NODUPKEY;WHERE X <> . ;BY DAT; RUN; 

Which gives me the record "2" as below:

Obs    OBS     DAT     TIM    X    Y    Z    
                                             
 1      2     1111    1002    4    .    .    


Now with this WHERE statement issue comes if all values for Variable X are missing I require that
SORT behave like normal NODUPKEY and gives me record "1" as in example 1 above.

Any thoughts please?

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User
 DATA LAB1;                                                       
    INPUT  OBS  DAT  TIM  X Y Z  ;                                 
 DATALINES;                                                        
 1    1111     1001      .  .  .                                   
 2    1111     1002      4  .  .                                   
 3   1111     1001      .  2  .                                   
 4   1111     1001      .  .  3 
  1    11112     1001      .  .  .                                   
 2    11112     1002      .  .  .                                   
 3   11112     1001      .  2  .                                   
 4   11112     1001      .  .  3
;                                  
 RUN;    
data want;
idx=1;
 do i=1 by 1 until(last.DAT);
  set LAB1;
  by DAT;
  if not missing(X) and not found then do;
   idx=i;found=1;
  end;
 end;
 
 do i=1 by 1 until(last.DAT);
  set LAB1;
  by DAT;
  if idx=i then output;
 end;
 
drop i found idx;
run;

View solution in original post

3 REPLIES 3
LinusH
Tourmaline | Level 20
The correct answer depend on if you have any other variation in your data.
But for the given sample data you could try either:
- pre sort the data by SAT and X descending, then proc sort NODUPKEY bi DAY.
- Use SQL GROUP BY DAY, and do max(x). But it's unclear how you wish to handle the values of the other variables. Is the requirement to keep the whole observation for which X has a value, otherwise the first? What process control the original sort order of the data?
Data never sleeps
Kurt_Bremser
Super User
proc dort data=lab1;
by DAT X;
run;

data lab1;
set lab1;
by DAT;
if last.DAT;
run;

The sort will make sure that a non-missing value in X will be last in every DAT Group, and the data step selects that one.

Ksharp
Super User
 DATA LAB1;                                                       
    INPUT  OBS  DAT  TIM  X Y Z  ;                                 
 DATALINES;                                                        
 1    1111     1001      .  .  .                                   
 2    1111     1002      4  .  .                                   
 3   1111     1001      .  2  .                                   
 4   1111     1001      .  .  3 
  1    11112     1001      .  .  .                                   
 2    11112     1002      .  .  .                                   
 3   11112     1001      .  2  .                                   
 4   11112     1001      .  .  3
;                                  
 RUN;    
data want;
idx=1;
 do i=1 by 1 until(last.DAT);
  set LAB1;
  by DAT;
  if not missing(X) and not found then do;
   idx=i;found=1;
  end;
 end;
 
 do i=1 by 1 until(last.DAT);
  set LAB1;
  by DAT;
  if idx=i then output;
 end;
 
drop i found idx;
run;

SAS Innovate 2025: Register Today!

 

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.


Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 2769 views
  • 0 likes
  • 4 in conversation