How to summarize a bunch of queries

dstuder · Posted 04-24-2020 03:11 AM

Hi everybody

In order to fill out a questionnaire I need to do some (around 300) queries in our database and extract the number of cases where certain conditions are true. My SAS-Code is created dynamically by a PHP-Script and looks like this (it's actually a loop over all queries):

/******************************************
Question "Q1"
******************************************/
DATA want (KEEP = var1 var2 var3);
SET myDatabase;
RUN;

PROC SORT NODUPKEY;
BY var1, var2, var3;
RUN;

PROC SQL;
   SELECT "Q1" AS question, COUNT(*) AS count FROM want WHERE var1='xy';
QUIT;

/****************************************** 
Question "Q2" 
******************************************/ 
DATA want (KEEP = var4 var5 var6); 
SET myDatabase;
RUN;

PROC SORT NODUPKEY;
BY var4, var5, var6;
RUN;

PROC SQL;
SELECT "Q2" AS question, COUNT(*) AS count FROM want WHERE var4='y';
QUIT;


... and so forth

Now, I need to summarise these results somehow, so that I get a table like this:

Question | Result
---------------------- 
Q1         120
Q2         240
Q3         1

My idea is that at the beginning I create an empty table `results` with two columns (name of the query and result). Then in each code-block (Q1, Q2, ... Qx) I append one line to the results-table (name of the query and the result itself).

But how can I do this? How can I save the result of each PROC SQL in an existing table?

Do you see any better solution?

ed_sas_member · Posted 04-24-2020 03:55 AM

Hi @dstuder

I suggest that you adapt the code as below:

/******************************************
Question "Q1"
******************************************/
DATA want (KEEP = var1 var2 var3);
SET myDatabase;
RUN;

PROC SORT NODUPKEY;
BY var1 var2 var3; 	/* <---- REMOVE COMMAS */
RUN;

PROC SQL;
   CREATE TABLE Q1 AS /* <---- ADD THIS */
   SELECT "Q1" AS question, COUNT(*) AS count FROM want WHERE var1='xy';
QUIT;

/****************************************** 
Question "Q2" 
******************************************/ 
DATA want (KEEP = var4 var5 var6); 
SET myDatabase;
RUN;

PROC SORT NODUPKEY;
BY var4 var5 var6; /* <---- REMOVE COMMAS */
RUN;

PROC SQL;
	CREATE TABLE Q2 AS /* <---- ADD THIS */
	SELECT "Q2" AS question, COUNT(*) AS count FROM want WHERE var4='y';
QUIT;

/**
	....
**/

/* APPEND */

DATA FINAL;
	LENGTH question $ 100;
	SET Q:;
	RENAME count = result;
RUN;

Best,

dstuder · Posted 04-24-2020 04:17 AM

Sorry for the bad code. It was more of a pseudo-code in order to show the problem and I didn't actually run it.

DATA FINAL;
	LENGTH question $ 100;
	SET Q:;
	RENAME count = result;
RUN;

So does this code actually summarize all the tables Q1, Q2, ... Q300 (created by PROC SQL) and put the table names and number of cases in a table `final`? I am also asking myself whether it wouldn't be better to overwrite a single table "Q" for each question instead of creating hundreds of tables called Q1, Q2 etc.?

ed_sas_member · Posted 04-24-2020 04:31 AM

Hi @dstuder

The CREATE TABLE xx AS statement just before the SELECT clause in PROC SQL put the results of the query in a dataset xx.

-> So in the code provided, it will create datasets Q1, Q2, ...

In the FINAL dataset, SAS append all tables beginning with the letter Q (-> Q:).

Definitely, you can avoid creating hundreds of intermediate datasets.

Here is a way to proceed:

/******************************************
Question "Q1"
******************************************/
DATA want1 (KEEP = var1 var2 var3); /* <--- RENAME WANT*/
SET myDatabase;
RUN;

PROC SORT NODUPKEY;
BY var1 var2 var3; 	/* <---- REMOVE COMMAS */
RUN;

/****************************************** 
Question "Q2" 
******************************************/ 
DATA want2 (KEEP = var4 var5 var6);  /* <--- RENAME WANT*/
SET myDatabase;
RUN;

PROC SORT NODUPKEY;
BY var4 var5 var6; /* <---- REMOVE COMMAS */
RUN;


/* APPEND */

PROC SQL;
   CREATE TABLE FINAL AS /* <---- ADD THIS */
   SELECT "Q1" AS question, COUNT(*) AS count FROM want1 WHERE var1='xy' /* ADAPT WANT1*/
	UNION /* <---- ADD THIS */
   SELECT "Q2" AS question, COUNT(*) AS count FROM want2 WHERE var4='y' /* ADAPT WANT2*/
    UNION
   /*...*/
	;
QUIT;

Kurt_Bremser · Posted 04-24-2020 03:58 AM

Is the WHERE condition always based on the first variable in the sort nodupkey?

Anyway, your code can be optimized:

%let question=Q1;
%let varlist=var1 var2 var3;
%let value=xy;

proc sort
  data=mydatabase (
    keep=&varlist
    where=(%scan(&varlist,1) = "&value")
  )
  out=sorted
  nodupkey
;
by &varlist;
run;

data single_result;
question = "&question";
count = nobs;
set sorted nobs=nobs;
stop;
keep question count;
run;

proc append data=single_result base=want force;
run;

If my assumptions were correct, this can form the basis of a macro with three parameters. Create a dataset with those three parameters, so you can run the macro in an automated way. Before running that, initialize the want dataset.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

dstuder · Posted 04-24-2020 04:11 AM

Hello Kurt,
actually no. The where-condition is random and depends entirely on what the specific question in the questionnaire asks for. It can contain one or several variables.

Kurt_Bremser · Posted 04-24-2020 04:30 AM

"Any kind of code" parameters in macros are quite tricky to handle. In this case I suggest you use call execute (or dosubl) from within a data step to create the code.

But the basic structure of my code is still valid: use the where condition and keep= in the first step that reads data and does the sort, and then get the number of observations.

Note that my code overwrites the temporary dataset single_result in every iteration and appends it to the final table, so you'll only have two datasets in the end, one of which can safely be discarded.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

dstuder · Posted 04-24-2020 04:33 AM

Thanks! I'll have to go through your answer carfully as I am still a newbie in SAS.

A short follow-up question: Does it make any difference in terms of performance whether I use a macro or if I just repeat all the code for each question? I mean if it's just a question of readability I dont really care because I am creating the SAS-code automatically using a PHP-script.

Kurt_Bremser · Posted 04-24-2020 04:55 AM

Creating the code from a php script will probably mean you write a program, run it in batch, rewrite it, run it again, and so on. Why not use the tools that SAS provides for creating code dynamically?

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

RichardDeVen · Posted 04-24-2020 04:40 AM

The following SQL construct can be used to insert (i.e. append) a query result into an existing table

INSERT INTO <TABLE>
SELECT ... ;

Example:

* blank base table for results;
proc sql;
  create table work.results 
  (Question CHAR(20), Result NUM);


* Dynamic 1 .... ;

data dynamic1; set sashelp.class; run;

* Append query results for Q1 to results table;
proc sql;
  insert into results
  select 'Q1', count(*) from dynamic1;

* Dynamic 2 .... ;

data dynamic2; set sashelp.class; where age < 14; run;

* Append query results for Q2 to results table; 
proc sql;
  insert into results
  select 'Q2', count(*) from dynamic2;

%let syslast = results;

ballardw · Posted 04-24-2020 10:13 AM

This looks a lot like skip pattern verification. If your current data is one record per respondent I would use a data step instead of hundreds of queries in SQL.

My basic approach when I have done this is to create a bunch of indicator variables 1 (for correct/desired/expected value of the variable with conditions) and 0 else. Then use a summary or report step to sum the 1 values to get one output table.

More dummy /pseudo code:

data want;
   set have;
   Q1 = (var1 ='xy');
   Q2 = (var4 ='y');
   /* skip pattern check: expect some value of Var5 
      when var3='y'
   */
   Q3 = (not missing(var5) and var3='y');

run;

proc summary data=want ;
   var Q1 - Q3;
   output out=work.summary sum=;   /* a mean would be percent of records
                                      with the response*/
run;

/* or a report procedure*/
proc tabulate data=want;
   var Q1 -Q3;
   table Q1 -Q3,
         sum
   ;
run;

How to summarize a bunch of queries

Re: How to summarize a bunch of queries

Re: How to summarize a bunch of queries

Re: How to summarize a bunch of queries

Re: How to summarize a bunch of queries

Re: How to summarize a bunch of queries

Re: How to summarize a bunch of queries

Re: How to summarize a bunch of queries

Re: How to summarize a bunch of queries

Re: How to summarize a bunch of queries

Re: How to summarize a bunch of queries

SAS Innovate 2025: Call for Content

Classroom Training Available!