There are A, B, C libraries.
In each library x, y, z number of data sets.
In A library the data sets range from a1,a2,.....,ax
In B library the data sets range from b1,b2,b3,...,by
In C library also the data sets range from c1,c2,c3,....cz
Now we need to check each data set and need to see are there any duplicates present?
and need to write the required information into final dataset. It should contain these columns.
library datasetname totalobservations duplicatescount
Hi,
How about the following one...it will be good if you post some sample data and output you want...
However,here is my try based on my understanding of your requirement...
proc sql;
create table test as
select libname as lib_name,memname as dsn,
nobs as tot_obs
from sashelp.vtable
where libname ^in ("SASHELP","SASUSER","WORK")
order by libname;
quit;
proc sort nodupkey data = test dupout = test1;
by dsn;
run;
proc sql noprint;
select distinct compress(quote(dsn)) into :duplicates_dsn separated by ","
from test1;
create table test1 as
select lib_name,dsn,tot_obs,count(dsn) as tot_dup
from test1
group by dsn;
quit;
%put &duplicates_dsn. &tot_dup.;
data want;
set test test1;
by dsn;
if tot_dup = . then tot_dup = 0;
run;
-Urvish
Homework?
So what's the question? What is considered a duplicate?
Have you tried querying the sashelp.vtable or dictionary.tables? It will give you columns 1-3, so you really only need column 4.
Hi,
How about the following one...it will be good if you post some sample data and output you want...
However,here is my try based on my understanding of your requirement...
proc sql;
create table test as
select libname as lib_name,memname as dsn,
nobs as tot_obs
from sashelp.vtable
where libname ^in ("SASHELP","SASUSER","WORK")
order by libname;
quit;
proc sort nodupkey data = test dupout = test1;
by dsn;
run;
proc sql noprint;
select distinct compress(quote(dsn)) into :duplicates_dsn separated by ","
from test1;
create table test1 as
select lib_name,dsn,tot_obs,count(dsn) as tot_dup
from test1
group by dsn;
quit;
%put &duplicates_dsn. &tot_dup.;
data want;
set test test1;
by dsn;
if tot_dup = . then tot_dup = 0;
run;
-Urvish
you have it right !!! you could include filesize as well ...
Helpfull Answer.
proc compare?
Reeza is right, you need to specify exactly what you consider a duplicate.
Is it a duplicate observation? Or a duplicate value in each of the variables? Even duplicate values for a single observation across multiple variables.
You said Urvish's answer was helpful, so what do you actually need?
From Urvish's answer I fullfilled my requirement. Looked for duplicate value in key variable.
We can mark this one as answered. But, I dont see correct answer, helpful answer buttons in this thread.
Thanks all.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.