Solved: Re: How to get # of Distinct ID's in all datasets of a library - Page 2

PaigeMiller · Posted 01-25-2021 12:23 PM

From now on, I request that the LOG should be copied as text and pasted into the window that appears when you click on the </> icon. And please show us the entire log, not selected portions.

Was there an existing data set named ALL before you ran this code? If so, change the name ALL in the PROC APPEND code to some data set name that doesn't exist yet.

--
Paige Miller

shasank · Posted 01-25-2021 12:32 PM

Thank you. I got the final results. I appreciate your help.

Tom · Posted 01-25-2021 12:27 PM

Probably best to add the FORCE option to PROC APPEND. Otherwise PROC APPEND will expect the ALL dataset to already exist. You will probable also want to remove any existing ALL dataset before starting so avoid confusion.

ballardw · Posted 01-25-2021 11:35 AM

First question: Are the ID variables of the same type with the same name and length in each data set?

Second: How many data sets are we discussing?

Third: By "distinct" do you really mean "distinct across all data sets" , i.e. if an ID exists in more than one set then only count it once? Which would bring up the question of which data set to count it in.

One approach is to combine all the datasets keeping only the id variable and adding the source data set name using the Indsname option. But if the variables are of different lengths or types this won't work as a simple statement.

The summarize.

Something like this:

data work.ids;
   set 
       dataone (keep=id)
       datatwo (keep=id)
       datathree (keep=id)
       indsname=dsname
   ;
   source=dsname;
run;

proc sql;
    
    select source, count(*) as uniqueids
    from (select distinct source,id from work.ids)
    group by source
    ;
run;

You should be able to test if the logic will work with two or three sets. Obviously your library should be part of the data set name.

If you have have a LOT of data sets let us know.

shasank · Posted 01-25-2021 11:45 AM

Thank you for your reply. The library has around 50 datasets and many more will be added on a weekly basis. So, I was try to see if we could have a way for SAS to get the names of the datasets.

Reeza · Posted 01-25-2021 12:08 PM

You didn't answer most of the questions asked by @ballardw

Those are what drive the solution.

Show us an example of what you have and what you expect.

Here's an example of generating a HAVE data set.

data lib1.demo1;
do ID=1 to 20;
x = rand('integer', 12, 45);
output;
end;
run;

data lib1.random1;
do ID=1 to 20;
x = rand('integer', 12, 45);
output;
end;
run;

Would the expected results from those tables be 20 or 40?

What would you expect as the output?

Reeza · Posted 01-25-2021 12:28 PM

You can get a list of the data sets in a library using the sashelp.vtable view or dictionary.table view.

From there:

Write a macro that counts for one data set and then loop through and do that for all data sets.
Combine all data sets into a view and do a single count
Write a macro and use call execute

Either way you need to process each data set once. If each data set is being handled uniquely and you can guarantee that each data set will have an ID column that's pretty straightforward. You always start a problem like this by getting it working for one data set and then scaling that to all data sets.

Here are some tutorials, though I suspect someone will give you an answer with actual code. I'm only providing idea's as per your initial question.

UCLA introductory tutorial on macro variables and macros

https://stats.idre.ucla.edu/sas/seminars/sas-macros-introduction/

Tutorial on converting a working program to a macro

This method is pretty robust and helps prevent errors and makes it much easier to debug your code. Obviously biased, because I wrote it 🙂 https://github.com/statgeek/SAS-Tutorials/blob/master/Turning%20a%20program%20into%20a%20macro.md

Examples of common macro usage

https://communities.sas.com/t5/SAS-Communities-Library/SAS-9-4-Macro-Language-Reference-Has-a-New-Ap...

@shasank wrote:

Hi Community,

I wanted to explore a way to get a # of all the Distinct ID's in all the datasets of a library. The final output should have the list of all the datasets in the library under column A and # of Distinct ID's in Column B. Please share possibilities or approachable logics.

Thank you for you time.

Re: How to get # of Distinct ID's in all datasets of a library

Re: How to get # of Distinct ID's in all datasets of a library

Re: How to get # of Distinct ID's in all datasets of a library

Re: How to get # of Distinct ID's in all datasets of a library

Re: How to get # of Distinct ID's in all datasets of a library

Re: How to get # of Distinct ID's in all datasets of a library

Re: How to get # of Distinct ID's in all datasets of a library

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away