Hello
I have a number of datasets with different variables; I need to return the duplicate records for each dataset (if they exist).
I'd like to do this by not having to specify all the variables (so I can wrap it in a macro which will go through all the datasets).
My example below: in the dataset TEST, Nick's row is entered twice.
I can use PROC SORT nouniquekeys to identify that, but in the BY I need to specify all the variables ( Name Age Car).
I cannot find how to use, say, *, without having to list all the variables?
data TEST; input Name $ Age Car $; datalines; Mike 40 Volvo Nick 35 Nissan Susan 51 BMW Bill 60 Volvo Tom 35 Ford Nick 35 Nissan Nadia 49 Nissan ; run; proc sort data=WORK.TEST nouniquekeys out=duplicates; by Name Age Car; run;
many thanks
Use the keyword _ALL_ instead which is a shortcut for all variables.
proc sort data=WORK.TEST nouniquekeys out=duplicates;
by _all_;
run;
Here is a reference that illustrates how to refer to variables and datasets in a short cut list:
https://blogs.sas.com/content/iml/2018/05/29/6-easy-ways-to-specify-a-list-of-variables-in-sas.html
@MART1 wrote:
Hello
I have a number of datasets with different variables; I need to return the duplicate records for each dataset (if they exist).
I'd like to do this by not having to specify all the variables (so I can wrap it in a macro which will go through all the datasets).
My example below: in the dataset TEST, Nick's row is entered twice.
I can use PROC SORT nouniquekeys to identify that, but in the BY I need to specify all the variables ( Name Age Car).
I cannot find how to use, say, *, without having to list all the variables?
data TEST; input Name $ Age Car $; datalines; Mike 40 Volvo Nick 35 Nissan Susan 51 BMW Bill 60 Volvo Tom 35 Ford Nick 35 Nissan Nadia 49 Nissan ; run; proc sort data=WORK.TEST nouniquekeys out=duplicates; by Name Age Car; run;many thanks
Use the keyword _ALL_ instead which is a shortcut for all variables.
proc sort data=WORK.TEST nouniquekeys out=duplicates;
by _all_;
run;
Here is a reference that illustrates how to refer to variables and datasets in a short cut list:
https://blogs.sas.com/content/iml/2018/05/29/6-easy-ways-to-specify-a-list-of-variables-in-sas.html
@MART1 wrote:
Hello
I have a number of datasets with different variables; I need to return the duplicate records for each dataset (if they exist).
I'd like to do this by not having to specify all the variables (so I can wrap it in a macro which will go through all the datasets).
My example below: in the dataset TEST, Nick's row is entered twice.
I can use PROC SORT nouniquekeys to identify that, but in the BY I need to specify all the variables ( Name Age Car).
I cannot find how to use, say, *, without having to list all the variables?
data TEST; input Name $ Age Car $; datalines; Mike 40 Volvo Nick 35 Nissan Susan 51 BMW Bill 60 Volvo Tom 35 Ford Nick 35 Nissan Nadia 49 Nissan ; run; proc sort data=WORK.TEST nouniquekeys out=duplicates; by Name Age Car; run;many thanks
That's great thanks so much @Reeza
Very interesting paper - just one question (not strictly related to the topic I appreciate) ; the _ALL_ cannot be used in PROC SQL, for example
PROC SQL; CREATE TABLE DUPS AS SELECT COUNT(DISTINCT _ALL_) FROM TEST; QUIT;
Do you know if there's an _ALL_ equivalent for SQL? (* can be used, but not with DISTINCT, or in a GROUP BY)
Thanks
The SQL equivalent of _ALL_ is *.
So this code will reduce HAVE from 19 observations to 11 distinct observations.
data have;
set sashelp.class;
keep age sex ;
run;
proc sql;
create table distinct as select distinct * from have ;
quit;
69 proc sql; 70 create table test as select count(distinct *) from sashelp.class ; ERROR: * used in an illegal position. NOTE: PROC SQL set option NOEXEC and will continue to check the syntax of statements. 71 quit;
Cannot be used for counts, but could be used to de-duplicate
But that is just the same invalid syntax you sent before.
If you want to count the number of distinct records use:
select count(*) as number_distinct from (select distinct * from have);
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.