Desktop productivity for business analysts and programmers

Drop certain criteria for a variable prior to producing summary stats

Reply
Contributor
Posts: 35

Drop certain criteria for a variable prior to producing summary stats

Folks,

 

I've a query which perhaps people could shed some light on. 

 

I'm utilsing the following piece of code to to get a count of the number of observations from each of my datasets, for a number of different libs.

 

proc sql; 
create table data as
	select libname, memname, nvar from dictionary.tables
where libname in ('BUS2011','BUS2012','BUS2013','BUS2014','BUS2015');quit;

However, within these datasets there is a particular variable I would like to get a count of. Let us call it 'ID'. ID is common in all datasets, across all years. 

 

Within ID there are two indicators to whether the variable is blank. This is either 'XXXX' or 'NONE'. 

 

What I would like to do is tell SAS if ID has either value to delete it. 

 

From here I would only be left with values with an actual ID. I would then like a count of all of these observations.

 

I understand how to this in a datastep, but the fact I'm dealing with 5 different libs, with 20 datasets in each one, would mean it would take some time to carry this out. 

 

Thus, I've turned to SQL, which I'm sure could speed up the process, but I'm just not too sure how to carry this out. 

 

Any help would be much appreciated. 

Esteemed Advisor
Esteemed Advisor
Posts: 7,245

Re: Drop certain criteria for a variable prior to producing summary stats

SQL is unlikely to speed things up, if anything on large datasets, it may well slow things down.  As for your question, the simplest method is to just generate the required code:

data _null_;
/* Expand this as you need with libnames, it gets a list of all datasets in */
/* the given libnames */

  set sashelp.vtable (where=(libname in "BUS2011","BUS2012"));

/* Using this list create a datastep for each dataset in those libnames */
/* Note I am updating them to v_<libname> - as I assume you don't want */
/* to overwrite */

  call execute('data v_'||catx('.',libname,memname)||'; set '||catx('.',libname,memname),'; where id not in ('XXXX','NONE'); run;');
run;   

So for every dataset in th egiven libnames this datastep will generate a datastep which looks like:

data v_<libname>.<dataset>;

  set <libname>.<dataset>;

  where id not in ('XXXX','NONE');

run;

 

So in effect copy the dataset and dropping those records.  You can then pull out the num obs from the created datasets.  

Contributor
Posts: 35

Re: Drop certain criteria for a variable prior to producing summary stats

RW9,

 

Thank you for this, but I'm a bit confused by the required code.

proc sql; 
create table vtable as
	select libname, memname, nvar from dictionary.tables
	
where libname in ('BUS2011');quit;


data _null_;
/* Expand this as you need with libnames, it gets a list of all datasets in */
/* the given libnames */

  set sashelp.vtable (where=(libname in ('BUS2011')));

/* Using this list create a datastep for each dataset in those libnames */
/* Note I am updating them to v_<libname> - as I assume you don't want */
/* to overwrite */

 call execute('data v_'||catx('.',libname,memname)||'; set '||catx('.',libname,memname),'; where id not in ('XXXX','NONE'); run;');
run; 

I'm wondering is there a need for the quottation marks in the line with call excute.?

 

When I attempt to run it I get the following errors;

 

ERROR: Invalid hexadecimal constant string '; where id not in ('X.
ERROR 388-185: Expecting an arithmetic operator.

ERROR 76-322: Syntax error, statement will be ignored.

 

Esteemed Advisor
Esteemed Advisor
Posts: 7,245

Re: Drop certain criteria for a variable prior to producing summary stats

Firstly, you do not need the proc sql part.  Next, yes quotes are needed.  Call execute() function requires a string, which is injected into the Base SAS compiler after the datastep stops executing.  This string therefore needs to resolve to valid SAS sysntax.

data _null_;
  set sashelp.vtable (where=(libname in ('BUS2011')));
  call execute('data v_'||catx('.',libname,memname)||'; 
set '
||catx('.',libname,memname),';
where id not in ("
XXXX","NONE");
run;'
); run;
 

You will note I put double quotes around the in() now, that is why you were getting the error (if I had some test data as a datastep, required output, then I would be able to test it directly, but I don't).  

Respected Advisor
Posts: 4,995

Re: Drop certain criteria for a variable prior to producing summary stats

Looking at these features:

 

  • You already have a table with all the data sets that need to be processed
  • You will need to go through each data set to count the valid ID values

Why not get a little extra information?  Construct a program that looks like this:

 

proc format;

value $mygroup ' '='Blank  'XXXX', 'NONE' = 'Invalid' other='Valid';

run;

 

proc freq data=memname.libname;

title "memname.libname";

tables id / missing;

format id $mygroup.;

run;

 

The trick is getting SAS to construct the dozens of PROC FREQ steps.  So hard-code the PROC FORMAT, then:

 

data _null_;

set data;

call execute(cats('proc freq data=', libname, '.', memname, ';'));

call execute(cats('title "', libname, '.', memname, '";'));

call execute ('tables id / missing; format id $mygroup.; run;');

run;

Contributor
Posts: 35

Re: Drop certain criteria for a variable prior to producing summary stats

Folks, 

 

Could I extend said code to only look at certain datasets within the quoted libs?

 

Something such as;

 

data _null_;

set sashelp.vtable 
(where=(libname in ('P2005','P2006','P2007','P2008','P2009','P2010','P2011','P2013','P2014','P2015')));
(where=(memname in ('ACCOUNTS','PERSONAL','TRADE'))); 

call execute(cats('proc freq data=', libname, '.', memname, ';'));

call execute(cats('title "', libname, '.', memname, '";'));

call execute ('tables id / missing; format id $mygroup.; run;');

run;
Respected Advisor
Posts: 4,995

Re: Drop certain criteria for a variable prior to producing summary stats

Yes, but there would still be only one WHERE clause used.  One possibility:

 

set sashelp.vtable 
(where=(libname in ('P2005','P2006','P2007','P2008','P2009','P2010','P2011','P2013','P2014','P2015')
 and memname in ('ACCOUNTS','PERSONAL','TRADE'))); 

 

Esteemed Advisor
Esteemed Advisor
Posts: 7,245

Re: Drop certain criteria for a variable prior to producing summary stats

Yes, as you have done, but your where is wrong:

(where=(libname in ('P2005','P2006','P2007','P2008','P2009','P2010','P2011','P2013','P2014','P2015') and memname in ('ACCOUNTS','PERSONAL','TRADE'))); 

If you want only certain datasets within the libraries, then you would need both levels:

(where=(catx('.',libname,memname) in ('P2005.ACCOUNTS')));
Ask a Question
Discussion stats
  • 7 replies
  • 155 views
  • 0 likes
  • 3 in conversation