Hi all, I have fields like lnno, cycdt, corrnind and dttm. If there are duplicate records based on lnno & cycdt then i want to sort based on corrnind and dttm and corrnind (usually space or Y) if there are any then that record(corrind record because it takes precedence over dt_tm) should be in want records if it not there then the latest dt_tm should be in the want records. Other records should be eliminated and should go in duplicate dataset data have: @001 lnno $09. @010 cycno yymmdd6. @017 corrind $1. @018 dt_tm $10. 367940841170415Y1704181430 373065728170415 1704181430 175597502170415Y1704181431 175597502170415Y1704181430 199994811170415Y1704181430 019354533170415Y1704181430 180878441170415 1704181430 180878441170515 1704181431 158072200170415 1704181430 158072200170515Y1704181430 158072200170515 1704181430 122492196170415 1704181430 261454099170415 1704181430 261454099170415Y1704181430 143798189170415 1704181430 143798189170415 1704181430 367940841170415 1704181430 373065728170415Y1704181430 430912595170415Y1704181430 000808229170415 1704181430 430912595170415 1704181430 199994811170415 1704181430 199994811170415 1704181430 019354533170415 1704181430 019354533170415Y1704181430 029267455170415Y1704181430 122381807170415 1704181430 029267455170415 1704172150 122382609170415Y1704180100 122382609170415 0000000000 data want: 367940841170415Y1704181430 373065728170415Y1704181430 175597502170415Y1704181431 199994811170415Y1704181430 019354533170415Y1704181430 180878441170515 1704181431 158072200170515Y1704181430 122492196170415 1704181430 261454099170415Y1704181430 143798189170415 1704181430 000808229170415 1704181430 430912595170415Y1704181430 029267455170415Y1704181430 122382609170415Y1704180100 data dup: 367940841170415 1704181430 373065728170415 1704181430 175597502170415Y1704181430 199994811170415 1704181430 199994811170415 1704181430 019354533170415Y1704181430 180878441170415 1704181430 158072200170415 1704181430 158072200170515 1704181430 261454099170415 1704181430 143798189170415 1704181430 430912595170415 1704181430 019354533170415 1704181430 029267455170415 1704172150 122382609170415 0000000000 I may miss a record in the dup but the no of records are not matching. I really appreciate your help on this. Thanks Neal.
First, the answer to your question is 'Yes". Secondly, it's more like COBOL than you think, but with some extras.
To answer your questions, though, think about my suggested code:
proc sort data=have out=want; by lnno cycno descending corrind descending dt_tm; run;
That first sort puts the records in order of inno, cycno, descending corrind (thus Y comes before missing) and finally, the most recent dt_tm records first.
The second sort:
proc sort data=want dupout=dups nodupkey; by lnno cycno; run;
doesn't change the order of the first sort, but it's only looking at inno and cycno, accepts the first record as a non-duplicate, then all ofthers as duplicates.
Art, CEO, AnalystFinder.com
proc sql;
create table foobar as
select distinct
lnno,cycno,corrind,dt_tm
from table1;
quit;
PROC SORT with the NODUPKEY option
data have;
input
@001 lnno $09.
@010 cycno yymmdd6.
@017 corrind $1.
@018 dt_tm $10.
;
datalines;
367940841170415Y1704181430
373065728170415 1704181430
175597502170415Y1704181431
175597502170415Y1704181430
199994811170415Y1704181430
019354533170415Y1704181430
180878441170415 1704181430
180878441170515 1704181431
158072200170415 1704181430
158072200170515Y1704181430
158072200170515 1704181430
122492196170415 1704181430
261454099170415 1704181430
261454099170415Y1704181430
143798189170415 1704181430
143798189170415 1704181430
367940841170415 1704181430
373065728170415Y1704181430
430912595170415Y1704181430
000808229170415 1704181430
430912595170415 1704181430
199994811170415 1704181430
199994811170415 1704181430
019354533170415 1704181430
019354533170415Y1704181430
029267455170415Y1704181430
122381807170415 1704181430
029267455170415 1704172150
122382609170415Y1704180100
122382609170415 0000000000
; run;
proc sort data=have NODUPRECS
out=distinct dupout=dups;
by lnno;
run;
LOG shows: 30 obs in HAVE, 10 duplicates deleted and saved in DUPS dataset, the rest 20 DISTINCT obs in out.
@Shmuel: Are you SURE you want to recommend using NODUPRECS in the proc sort statement? In the last 43 years that I've used SAS I've NEVER found a case where that was a useful option to use. It is quite different than NODUPKEY.
Art, CEO, AnalystFinder.com
@art297 wrote:
@Shmuel: Are you SURE you want to recommend using NODUPRECS in the proc sort statement? In the last 43 years that I've used SAS I've NEVER found a case where that was a useful option to use. It is quite different than NODUPKEY.
Art, CEO, AnalystFinder.com
@art297, pay attention - @saslovethemost talked about DUPLICATE RECORDS, without mentioning any keys or ids.
Sometimes I had to use NUDUPRECS instead NODUPKEY.
In this case, as keys were not defined, I used sort by just one variable but differed duplicates by all of them.
@Shmuel, the key is lnno and cycdt, if there are any two records with this key then further look for corrnind and dt_tm. If any record has corrind as 'Y' then that should go into want dataset and remaining should go into dup dataset. Further, if there are two records with corrnind as 'Y' then the latest dt_tm should be the one should go to want dataset and remaining should go to dup datset. Hope I have explained it right.
First, some problems with your data and how you read it. I think you want to use:
data have; input @001 lnno $09. @010 cycno yymmdd6. @016 corrind $1. @017 dt_tm $10.;
The way to get what you want is to do two sorts: one to order the records, the other the create the two files (want and dups). e.g.:
proc sort data=have out=want; by lnno cycno descending corrind descending dt_tm; run; proc sort data=want dupout=dups nodupkey; by lnno cycno; run;
However, your desired output doesn't reflect your data. Some examples are:
lnno: 180878441 where it has two different cycno values
lnno: 158072200 where it has two different cycno values
lnno: 122381807 which you don't include in your want file
Art, CEO, AnalystFinder.com
Neal
data one; input
@001 lnno $09.
@010 cycno yymmdd6.
@016 corrind $1.
@017 dt_tm $10.;
cards;
367940841170415Y1704181430
373065728170415 1704181430
175597502170415Y1704181431
175597502170415Y1704181430
199994811170415Y1704181430
019354533170415Y1704181430
180878441170415 1704181430
180878441170515 1704181431
158072200170415 1704181430
158072200170515Y1704181430
158072200170515 1704181430
122492196170415 1704181430
261454099170415 1704181430
261454099170415Y1704181430
143798189170415 1704181430
143798189170415 1704181430
367940841170415 1704181430
373065728170415Y1704181430
430912595170415Y1704181430
000808229170415 1704181430
430912595170415 1704181430
199994811170415 1704181430
199994811170415 1704181430
019354533170415 1704181430
019354533170415Y1704181430
029267455170415Y1704181430
122381807170415 1704181430
029267455170415 1704172150
122382609170415Y1704180100
122382609170415 0000000000
;
proc sort out=two ; by lnno cycno dt_tm;
proc print; title "two"; run;
data want; set two; by lnno cycno dt_tm;
if first.dt_tm and last.dt_tm then output;
proc print; title "unique";
data dups; set two; by lnno cycno dt_tm;
if first.dt_tm and not last.dt_tm;
proc print; title "dups"; run;
Art, thanks for the code and looks great.
I have a question though, will the code take corrnind into account? If the inpttxn is having corrnind as 'Y' even though that wont have the latest dt_tm the record with corrnind with 'Y' should take precedence and if two records(lnno and cycno) are having corrnind s and dt_tm s then the one with the latest dt_tm should take precedence. I dont see in the code you mentoning something like if corrnind = 'Y' etc..SAS is totally different than legacy COBOL i think.
What you said about the follwoing is correct, so it should be in the want dataset if it is a different cycno. If the lnno and cycno is same then only will further processed to look for corrnind and dt_tm.
lnno: 180878441 where it has two different cycno values
lnno: 158072200 where it has two different cycno values
lnno: 122381807 which you don't include in your want file
I really appreciate your time on this. Cheers.
First, the answer to your question is 'Yes". Secondly, it's more like COBOL than you think, but with some extras.
To answer your questions, though, think about my suggested code:
proc sort data=have out=want; by lnno cycno descending corrind descending dt_tm; run;
That first sort puts the records in order of inno, cycno, descending corrind (thus Y comes before missing) and finally, the most recent dt_tm records first.
The second sort:
proc sort data=want dupout=dups nodupkey; by lnno cycno; run;
doesn't change the order of the first sort, but it's only looking at inno and cycno, accepts the first record as a non-duplicate, then all ofthers as duplicates.
Art, CEO, AnalystFinder.com
The SAS data step will stop iteration immediately at the line where the SET statement tries to read from an empty SAS data set. Therefore the SET statement needs to come after your test IF RECS=0 as else this check will never get executed in case of an empty Have table.
I'd also add a STOP statement so that in case there is data in your HAVE table you're not iterating through the whole dataset for no reason.
Below code should do the job:
data have;
stop;
set sashelp.class;
run;
data _null_;
if recs=0 then abort;
stop;
set have nobs=recs;
run;
Instead of STOP; you could also use DS option OBS=0 to not read actual data.
You can also create a specific return code in case you abort your program which in a production environment can support job monitoring (the return code can be used to tell you why the job has been aborted).
/* case: Have with 0 records */
data have;
stop;
set sashelp.class;
run;
data _null_;
if recs=0 then abort return 10;
set have(obs=0) nobs=recs;
run;
In either case, I am getting the follwoing error with STAT as U999
50 data _null_; 51 if recs=0 then abort return 10; 52 set have(obs=0) nobs=recs; 53 run; ERROR: Execution terminated by an ABORT statement at line 51 column 19, it specified the RETURN option. NOTE: Limit set by ERRORS= option reached. Further errors of this type will not be printed. NOTE: The SAS System stopped processing this step because of errors. NOTE: The DATA statement used 0.01 CPU seconds and 13062K. ERROR: SAS ended due to errors. You specified: OPTIONS ERRORABEND;. ERROR: Errors printed on page 3.
when I used 'call execute ('endsas;'); then I get the following SYSOUT messages. 50 data _null_; 51 if recs=0 then call execute('endsas;'); 52 set have(obs=0) nobs=recs; 53 run; NOTE: There were 0 observations read from the data set WORK.HAVE. NOTE: The DATA statement used 0.01 CPU seconds and 13206K. NOTE: The address space has used a maximum of 876K below the line and 17084K above the line. NOTE: CALL EXECUTE generated line. 53 ! 1 + endsas;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.