Hello,
is there anybody that knows if it's possible to take this sorted data and move duplicate records to another dataset in a way that I didn't need to hand code it (with PROC SQL, PROC SORT, ...).
If it's not possible, I'll probably be able to move the records using last. and first. statements.
// INPUT
PROC DATA=AA out=messages;
BY DURAO ORIGINE DESTINATION CHARGE;
RUN;
PROC PRINT DATA=MESSAGES;
RUN;
Obs DURAO ORIGINE DESTINATION CHARGE
1 0 2 2 3.89
2 0 2 6 563.56
3 0 2 6 563.56
4 0 4 2 12.18
5 0 4 2 45.23
6 0 4 2 45.23
7 0 9 2 145.32
8 1 1 1 1103.32
9 1 1 9 714.32
10 1 3 9 5623.02
11 1 4 2 45.23
// OUTPUT
PROC PRINT DATA=MESSAGES;
RUN;
Obs DURAO ORIGINE DESTINATION CHARGE
1 0 2 2 3.89
2 0 2 6 563.56
3 0 4 2 12.18
4 0 4 2 45.23
5 0 9 2 145.32
6 1 1 1 1103.32
7 1 1 9 714.32
8 1 3 9 5623.02
9 1 4 2 45.23
PROC PRINT DATA=DUPLICATE;
RUN;
Obs DURAO ORIGINE DESTINATION CHARGE
1 0 2 6 563.56
2 0 4 2 45.23
To achieve the same result as the proc sort code I suggested you could use something like the following with older versions of SAS:
data test duplicate;
set aa;
by DURAO ORIGINE DESTINATION CHARGE;
if first.charge then output test;
else do;
output duplicate;
end;
run;
Couldn't you do what you want using the dupout option in a proc sort nodupkey run?
for example:
proc sort data=aa out=test dupout=duplicate nodupkey;
by DURAO ORIGINE DESTINATION CHARGE;
run;
I'm not sure if you want to remove the duplicates from the original file, so in the above code, the file with the duplicates removed was written to a file called test.
For a duplicate (2 times the same record), I move 1 record to the new file test and I keep the other in the original data. Your suggestion seems good, but I use SAS 8.2 I've read that the Dupout option is new with SAS 9. Is it the case?
Since you're on 8.2, I believe you're correct that you'll have to "Use FIRST.variable and LAST.variable (made available with BY-Group processing) to determine if a BY-Group contains unique or duplicate observations."
Look under the "Full Code" tab at http://support.sas.com/kb/24/626.html for specific details.
Just to add onto what Art said, SAS notes the following at http://support.sas.com/kb/24/626.html:
"Note: Beginning with SAS 9.1, the DUPOUT= option on a PROC SORT statement specifies an output data set that contains duplicate observations. The new data set will not contain *all* the duplicates, but one record for each duplicate BY value. See the Base SAS(R) 9.1 Procedures Guide for more details. proc sort data=dsn dupout=new_dsn nodupkey; by var; run;"
To achieve the same result as the proc sort code I suggested you could use something like the following with older versions of SAS:
data test duplicate;
set aa;
by DURAO ORIGINE DESTINATION CHARGE;
if first.charge then output test;
else do;
output duplicate;
end;
run;
@ robby_beum
For some reason, the code they give didn't seem to work for me.
@ art297
Thanks!
The code you gave me seems to works well.
The code that robby pointed to will put ALL dups in the dups file (including the first one) and only put records in the nodups file that don't contain ANY duplicates.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.