Folks,
I've a query which perhaps people could shed some light on.
I'm utilsing the following piece of code to to get a count of the number of observations from each of my datasets, for a number of different libs.
proc sql;
create table data as
select libname, memname, nvar from dictionary.tables
where libname in ('BUS2011','BUS2012','BUS2013','BUS2014','BUS2015');quit;
However, within these datasets there is a particular variable I would like to get a count of. Let us call it 'ID'. ID is common in all datasets, across all years.
Within ID there are two indicators to whether the variable is blank. This is either 'XXXX' or 'NONE'.
What I would like to do is tell SAS if ID has either value to delete it.
From here I would only be left with values with an actual ID. I would then like a count of all of these observations.
I understand how to this in a datastep, but the fact I'm dealing with 5 different libs, with 20 datasets in each one, would mean it would take some time to carry this out.
Thus, I've turned to SQL, which I'm sure could speed up the process, but I'm just not too sure how to carry this out.
Any help would be much appreciated.
SQL is unlikely to speed things up, if anything on large datasets, it may well slow things down. As for your question, the simplest method is to just generate the required code:
data _null_; /* Expand this as you need with libnames, it gets a list of all datasets in */ /* the given libnames */ set sashelp.vtable (where=(libname in "BUS2011","BUS2012")); /* Using this list create a datastep for each dataset in those libnames */ /* Note I am updating them to v_<libname> - as I assume you don't want */ /* to overwrite */ call execute('data v_'||catx('.',libname,memname)||'; set '||catx('.',libname,memname),'; where id not in ('XXXX','NONE'); run;'); run;
So for every dataset in th egiven libnames this datastep will generate a datastep which looks like:
data v_<libname>.<dataset>;
set <libname>.<dataset>;
where id not in ('XXXX','NONE');
run;
So in effect copy the dataset and dropping those records. You can then pull out the num obs from the created datasets.
RW9,
Thank you for this, but I'm a bit confused by the required code.
proc sql;
create table vtable as
select libname, memname, nvar from dictionary.tables
where libname in ('BUS2011');quit;
data _null_;
/* Expand this as you need with libnames, it gets a list of all datasets in */
/* the given libnames */
set sashelp.vtable (where=(libname in ('BUS2011')));
/* Using this list create a datastep for each dataset in those libnames */
/* Note I am updating them to v_<libname> - as I assume you don't want */
/* to overwrite */
call execute('data v_'||catx('.',libname,memname)||'; set '||catx('.',libname,memname),'; where id not in ('XXXX','NONE'); run;');
run;
I'm wondering is there a need for the quottation marks in the line with call excute.?
When I attempt to run it I get the following errors;
ERROR: Invalid hexadecimal constant string '; where id not in ('X.
ERROR 388-185: Expecting an arithmetic operator.
ERROR 76-322: Syntax error, statement will be ignored.
Firstly, you do not need the proc sql part. Next, yes quotes are needed. Call execute() function requires a string, which is injected into the Base SAS compiler after the datastep stops executing. This string therefore needs to resolve to valid SAS sysntax.
data _null_;
set sashelp.vtable (where=(libname in ('BUS2011')));
call execute('data v_'||catx('.',libname,memname)||';
set '||catx('.',libname,memname),';
where id not in ("XXXX","NONE");
run;');
run;
You will note I put double quotes around the in() now, that is why you were getting the error (if I had some test data as a datastep, required output, then I would be able to test it directly, but I don't).
Looking at these features:
Why not get a little extra information? Construct a program that looks like this:
proc format;
value $mygroup ' '='Blank 'XXXX', 'NONE' = 'Invalid' other='Valid';
run;
proc freq data=memname.libname;
title "memname.libname";
tables id / missing;
format id $mygroup.;
run;
The trick is getting SAS to construct the dozens of PROC FREQ steps. So hard-code the PROC FORMAT, then:
data _null_;
set data;
call execute(cats('proc freq data=', libname, '.', memname, ';'));
call execute(cats('title "', libname, '.', memname, '";'));
call execute ('tables id / missing; format id $mygroup.; run;');
run;
Folks,
Could I extend said code to only look at certain datasets within the quoted libs?
Something such as;
data _null_;
set sashelp.vtable
(where=(libname in ('P2005','P2006','P2007','P2008','P2009','P2010','P2011','P2013','P2014','P2015')));
(where=(memname in ('ACCOUNTS','PERSONAL','TRADE')));
call execute(cats('proc freq data=', libname, '.', memname, ';'));
call execute(cats('title "', libname, '.', memname, '";'));
call execute ('tables id / missing; format id $mygroup.; run;');
run;
Yes, but there would still be only one WHERE clause used. One possibility:
set sashelp.vtable
(where=(libname in ('P2005','P2006','P2007','P2008','P2009','P2010','P2011','P2013','P2014','P2015')
and memname in ('ACCOUNTS','PERSONAL','TRADE')));
Yes, as you have done, but your where is wrong:
(where=(libname in ('P2005','P2006','P2007','P2008','P2009','P2010','P2011','P2013','P2014','P2015') and memname in ('ACCOUNTS','PERSONAL','TRADE')));
If you want only certain datasets within the libraries, then you would need both levels:
(where=(catx('.',libname,memname) in ('P2005.ACCOUNTS')));
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.