- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
From now on, I request that the LOG should be copied as text and pasted into the window that appears when you click on the </> icon. And please show us the entire log, not selected portions.
Was there an existing data set named ALL before you ran this code? If so, change the name ALL in the PROC APPEND code to some data set name that doesn't exist yet.
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Probably best to add the FORCE option to PROC APPEND. Otherwise PROC APPEND will expect the ALL dataset to already exist. You will probable also want to remove any existing ALL dataset before starting so avoid confusion.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
First question: Are the ID variables of the same type with the same name and length in each data set?
Second: How many data sets are we discussing?
Third: By "distinct" do you really mean "distinct across all data sets" , i.e. if an ID exists in more than one set then only count it once? Which would bring up the question of which data set to count it in.
One approach is to combine all the datasets keeping only the id variable and adding the source data set name using the Indsname option. But if the variables are of different lengths or types this won't work as a simple statement.
The summarize.
Something like this:
data work.ids; set dataone (keep=id) datatwo (keep=id) datathree (keep=id) indsname=dsname ; source=dsname; run; proc sql; select source, count(*) as uniqueids from (select distinct source,id from work.ids) group by source ; run;
You should be able to test if the logic will work with two or three sets. Obviously your library should be part of the data set name.
If you have have a LOT of data sets let us know.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
You didn't answer most of the questions asked by @ballardw
Those are what drive the solution.
Show us an example of what you have and what you expect.
Here's an example of generating a HAVE data set.
data lib1.demo1;
do ID=1 to 20;
x = rand('integer', 12, 45);
output;
end;
run;
data lib1.random1;
do ID=1 to 20;
x = rand('integer', 12, 45);
output;
end;
run;
Would the expected results from those tables be 20 or 40?
What would you expect as the output?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
You can get a list of the data sets in a library using the sashelp.vtable view or dictionary.table view.
From there:
- Write a macro that counts for one data set and then loop through and do that for all data sets.
- Combine all data sets into a view and do a single count
- Write a macro and use call execute
Either way you need to process each data set once. If each data set is being handled uniquely and you can guarantee that each data set will have an ID column that's pretty straightforward. You always start a problem like this by getting it working for one data set and then scaling that to all data sets.
Here are some tutorials, though I suspect someone will give you an answer with actual code. I'm only providing idea's as per your initial question.
UCLA introductory tutorial on macro variables and macros
https://stats.idre.ucla.edu/sas/seminars/sas-macros-introduction/
Tutorial on converting a working program to a macro
This method is pretty robust and helps prevent errors and makes it much easier to debug your code. Obviously biased, because I wrote it 🙂 https://github.com/statgeek/SAS-Tutorials/blob/master/Turning%20a%20program%20into%20a%20macro.md
Examples of common macro usage
https://communities.sas.com/t5/SAS-Communities-Library/SAS-9-4-Macro-Language-Reference-Has-a-New-Ap...
@shasank wrote:
Hi Community,
I wanted to explore a way to get a # of all the Distinct ID's in all the datasets of a library. The final output should have the list of all the datasets in the library under column A and # of Distinct ID's in Column B. Please share possibilities or approachable logics.
Thank you for you time.
- « Previous
-
- 1
- 2
- Next »