From now on, I request that the LOG should be copied as text and pasted into the window that appears when you click on the </> icon. And please show us the entire log, not selected portions.
Was there an existing data set named ALL before you ran this code? If so, change the name ALL in the PROC APPEND code to some data set name that doesn't exist yet.
Probably best to add the FORCE option to PROC APPEND. Otherwise PROC APPEND will expect the ALL dataset to already exist. You will probable also want to remove any existing ALL dataset before starting so avoid confusion.
First question: Are the ID variables of the same type with the same name and length in each data set?
Second: How many data sets are we discussing?
Third: By "distinct" do you really mean "distinct across all data sets" , i.e. if an ID exists in more than one set then only count it once? Which would bring up the question of which data set to count it in.
One approach is to combine all the datasets keeping only the id variable and adding the source data set name using the Indsname option. But if the variables are of different lengths or types this won't work as a simple statement.
The summarize.
Something like this:
data work.ids; set dataone (keep=id) datatwo (keep=id) datathree (keep=id) indsname=dsname ; source=dsname; run; proc sql; select source, count(*) as uniqueids from (select distinct source,id from work.ids) group by source ; run;
You should be able to test if the logic will work with two or three sets. Obviously your library should be part of the data set name.
If you have have a LOT of data sets let us know.
You didn't answer most of the questions asked by @ballardw
Those are what drive the solution.
Show us an example of what you have and what you expect.
Here's an example of generating a HAVE data set.
data lib1.demo1;
do ID=1 to 20;
x = rand('integer', 12, 45);
output;
end;
run;
data lib1.random1;
do ID=1 to 20;
x = rand('integer', 12, 45);
output;
end;
run;
Would the expected results from those tables be 20 or 40?
What would you expect as the output?
You can get a list of the data sets in a library using the sashelp.vtable view or dictionary.table view.
From there:
Either way you need to process each data set once. If each data set is being handled uniquely and you can guarantee that each data set will have an ID column that's pretty straightforward. You always start a problem like this by getting it working for one data set and then scaling that to all data sets.
Here are some tutorials, though I suspect someone will give you an answer with actual code. I'm only providing idea's as per your initial question.
UCLA introductory tutorial on macro variables and macros
https://stats.idre.ucla.edu/sas/seminars/sas-macros-introduction/
Tutorial on converting a working program to a macro
This method is pretty robust and helps prevent errors and makes it much easier to debug your code. Obviously biased, because I wrote it 🙂 https://github.com/statgeek/SAS-Tutorials/blob/master/Turning%20a%20program%20into%20a%20macro.md
Examples of common macro usage
https://communities.sas.com/t5/SAS-Communities-Library/SAS-9-4-Macro-Language-Reference-Has-a-New-Ap...
@shasank wrote:
Hi Community,
I wanted to explore a way to get a # of all the Distinct ID's in all the datasets of a library. The final output should have the list of all the datasets in the library under column A and # of Distinct ID's in Column B. Please share possibilities or approachable logics.
Thank you for you time.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.