BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
PaigeMiller
Diamond | Level 26

From now on, I request that the LOG should be copied as text and pasted into the window that appears when you click on the </> icon. And please show us the entire log, not selected portions.

 

Was there an existing data set named ALL before you ran this code? If so, change the name ALL in the PROC APPEND code to some data set name that doesn't exist yet.

--
Paige Miller
shasank
Quartz | Level 8
Thank you. I got the final results. I appreciate your help.
Tom
Super User Tom
Super User

Probably best to add the FORCE option to PROC APPEND.  Otherwise PROC APPEND will expect the ALL dataset to already exist.  You will probable also want to remove any existing ALL dataset before starting so avoid confusion.

ballardw
Super User

First question: Are the ID variables of the same type with  the same name and length in each data set?

Second: How many data sets are we discussing?

Third: By "distinct" do you really mean "distinct across all data sets" , i.e. if an ID exists in more than one set then only count it once? Which would bring up the question of which data set to count it in.

 

One approach is to combine all the datasets keeping only the id variable and adding the source data set name using the Indsname option. But if the variables are of different lengths or types this won't work as a simple statement.

The summarize.

Something like this:

data work.ids;
   set 
       dataone (keep=id)
       datatwo (keep=id)
       datathree (keep=id)
       indsname=dsname
   ;
   source=dsname;
run;

proc sql;
    
    select source, count(*) as uniqueids
    from (select distinct source,id from work.ids)
    group by source
    ;
run;

You should be able to test if the logic will work with two or three sets. Obviously your library should be part of the data set name.

 

If you have have a LOT of data sets let us know.

shasank
Quartz | Level 8
Thank you for your reply. The library has around 50 datasets and many more will be added on a weekly basis. So, I was try to see if we could have a way for SAS to get the names of the datasets.
Reeza
Super User

You didn't answer most of the questions asked by @ballardw 

Those are what drive the solution. 

 

Show us an example of what you have and what you expect.

 

Here's an example of generating a HAVE data set.

 

data lib1.demo1;
do ID=1 to 20;
x = rand('integer', 12, 45);
output;
end;
run;

data lib1.random1;
do ID=1 to 20;
x = rand('integer', 12, 45);
output;
end;
run;

Would the expected results from those tables be 20 or 40?

What would you expect as the output?

 

 

Reeza
Super User

You can get a list of the data sets in a library using the sashelp.vtable view or dictionary.table view.

From there:

 

  1. Write a macro that counts for one data set and then loop through and do that for all data sets. 
  2. Combine all data sets into a view and do a single count
  3. Write a macro and use call execute

Either way you need to process each data set once. If each data set is being handled uniquely and you can guarantee that each data set will have an ID column that's pretty straightforward. You always start a problem like this by getting it working for one data set and then scaling that to all data sets.

Here are some tutorials, though I suspect someone will give you an answer with actual code. I'm only providing idea's as per your initial question.

 

UCLA introductory tutorial on macro variables and macros

https://stats.idre.ucla.edu/sas/seminars/sas-macros-introduction/

Tutorial on converting a working program to a macro

This method is pretty robust and helps prevent errors and makes it much easier to debug your code. Obviously biased, because I wrote it 🙂 https://github.com/statgeek/SAS-Tutorials/blob/master/Turning%20a%20program%20into%20a%20macro.md

Examples of common macro usage

https://communities.sas.com/t5/SAS-Communities-Library/SAS-9-4-Macro-Language-Reference-Has-a-New-Ap...

 


@shasank wrote:

Hi Community, 

 

I wanted to explore a way to get a # of all the Distinct ID's in all the datasets of a library. The final output should have the list of all the datasets in the library under column A and # of Distinct ID's in Column B. Please share possibilities or approachable logics. 

 

Thank you for you time.


 

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 21 replies
  • 1696 views
  • 4 likes
  • 6 in conversation