- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi Community,
I wanted to explore a way to get a # of all the Distinct ID's in all the datasets of a library. The final output should have the list of all the datasets in the library under column A and # of Distinct ID's in Column B. Please share possibilities or approachable logics.
Thank you for you time.
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
/* Find all data sets in a library named PAIGE */
proc sql;
create table dsnames as select distinct memname from dictionary.tables where libname='PAIGE';
quit;
/* Count distinct IDs in all of these data sets */
data _null_;
set dsnames;
length str $ 1024;
str=cats('proc sql; create table ids as select "' ,memname,'" as memname length=32,count(distinct id) as n from paige.',memname,';quit;');
str2='proc append base=all new=ids; run;';
str3='proc delete data=ids; run;';
call execute(cat(str,str2,str3));
run;
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
What if a data set in the library does not contain the variable ID?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Ok. Do the data sets share some naming convention?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Yes but do the actual data sets have similar names, like data1, data2 and so on?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Ok. Even though you got your answer, here is an alternative.
data one;
input id var1;
datalines;
1 10
1 20
3 40
3 50
3 60
;
data two;
input id var2;
datalines;
1 10
1 20
4 30
4 40
6 50
6 60
6 70
;
data three;
input id var3;
datalines;
7 10
7 20
7 30
7 40
;
data want(keep = memname distinct);
dcl hash hoh (ordered : "Y");
hoh.definekey("memname", "id");
hoh.definedata("h", "memname", "id");
hoh.definedone();
dcl hiter i ("hoh");
do until (z);
set sashelp.vcolumn(where=(libname = 'WORK' and upcase(name) = 'ID')) end = z;
id=.;
if hoh.find() ne 0 then do;
dcl hash h (dataset : memname, duplicate : 'r');
h.definekey ("id");
h.definedone();
hoh.add();
end;
end;
do while (i.next() = 0);
distinct = h.num_items;
output;
end;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
/* Find all data sets in a library named PAIGE */
proc sql;
create table dsnames as select distinct memname from dictionary.tables where libname='PAIGE';
quit;
/* Count distinct IDs in all of these data sets */
data _null_;
set dsnames;
length str $ 1024;
str=cats('proc sql; create table ids as select "' ,memname,'" as memname length=32,count(distinct id) as n from paige.',memname,';quit;');
str2='proc append base=all new=ids; run;';
str3='proc delete data=ids; run;';
call execute(cat(str,str2,str3));
run;
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@shasank wrote:
HI Paige, Thank you for your help. I tried your code and I the dsnames came out as an empty dataset. The log has no errors.
Show us. Show us the log, and show us the incorrect results.
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi Paige,
Here is the code.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Library name must be capital letters:
LABOUR
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thank you. The all caps of Library worked. I am trouble shooting the rest of the code.
/* Count distinct IDs in all of these data sets */
data _null_;
set dsnames;
length str $ 1024;
str=cats('proc sql; create table ids as select "' ,memname,'" as memname length=32,count(distinct ptid) as n from Labor.',memname,';quit;');
str2='proc append base=all new=ids; run;';
str3='proc delete data=ids; run;';
call execute(cat(str,str2,str3));
run;