Hi everybody
In order to fill out a questionnaire I need to do some (around 300) queries in our database and extract the number of cases where certain conditions are true. My SAS-Code is created dynamically by a PHP-Script and looks like this (it's actually a loop over all queries):
/****************************************** Question "Q1" ******************************************/ DATA want (KEEP = var1 var2 var3); SET myDatabase;
RUN;
PROC SORT NODUPKEY;
BY var1, var2, var3;
RUN;
PROC SQL;
SELECT "Q1" AS question, COUNT(*) AS count FROM want WHERE var1='xy';
QUIT;
/******************************************
Question "Q2"
******************************************/
DATA want (KEEP = var4 var5 var6);
SET myDatabase;
RUN;
PROC SORT NODUPKEY;
BY var4, var5, var6;
RUN;
PROC SQL;
SELECT "Q2" AS question, COUNT(*) AS count FROM want WHERE var4='y';
QUIT;
... and so forth
Now, I need to summarise these results somehow, so that I get a table like this:
Question | Result ----------------------
Q1 120
Q2 240
Q3 1
My idea is that at the beginning I create an empty table `results` with two columns (name of the query and result). Then in each code-block (Q1, Q2, ... Qx) I append one line to the results-table (name of the query and the result itself).
But how can I do this? How can I save the result of each PROC SQL in an existing table?
Do you see any better solution?
Hi @dstuder
I suggest that you adapt the code as below:
/******************************************
Question "Q1"
******************************************/
DATA want (KEEP = var1 var2 var3);
SET myDatabase;
RUN;
PROC SORT NODUPKEY;
BY var1 var2 var3; /* <---- REMOVE COMMAS */
RUN;
PROC SQL;
CREATE TABLE Q1 AS /* <---- ADD THIS */
SELECT "Q1" AS question, COUNT(*) AS count FROM want WHERE var1='xy';
QUIT;
/******************************************
Question "Q2"
******************************************/
DATA want (KEEP = var4 var5 var6);
SET myDatabase;
RUN;
PROC SORT NODUPKEY;
BY var4 var5 var6; /* <---- REMOVE COMMAS */
RUN;
PROC SQL;
CREATE TABLE Q2 AS /* <---- ADD THIS */
SELECT "Q2" AS question, COUNT(*) AS count FROM want WHERE var4='y';
QUIT;
/**
....
**/
/* APPEND */
DATA FINAL;
LENGTH question $ 100;
SET Q:;
RENAME count = result;
RUN;
Best,
Sorry for the bad code. It was more of a pseudo-code in order to show the problem and I didn't actually run it.
DATA FINAL; LENGTH question $ 100; SET Q:; RENAME count = result; RUN;
So does this code actually summarize all the tables Q1, Q2, ... Q300 (created by PROC SQL) and put the table names and number of cases in a table `final`? I am also asking myself whether it wouldn't be better to overwrite a single table "Q" for each question instead of creating hundreds of tables called Q1, Q2 etc.?
Hi @dstuder
The CREATE TABLE xx AS statement just before the SELECT clause in PROC SQL put the results of the query in a dataset xx.
-> So in the code provided, it will create datasets Q1, Q2, ...
In the FINAL dataset, SAS append all tables beginning with the letter Q (-> Q:).
Definitely, you can avoid creating hundreds of intermediate datasets.
Here is a way to proceed:
/******************************************
Question "Q1"
******************************************/
DATA want1 (KEEP = var1 var2 var3); /* <--- RENAME WANT*/
SET myDatabase;
RUN;
PROC SORT NODUPKEY;
BY var1 var2 var3; /* <---- REMOVE COMMAS */
RUN;
/******************************************
Question "Q2"
******************************************/
DATA want2 (KEEP = var4 var5 var6); /* <--- RENAME WANT*/
SET myDatabase;
RUN;
PROC SORT NODUPKEY;
BY var4 var5 var6; /* <---- REMOVE COMMAS */
RUN;
/* APPEND */
PROC SQL;
CREATE TABLE FINAL AS /* <---- ADD THIS */
SELECT "Q1" AS question, COUNT(*) AS count FROM want1 WHERE var1='xy' /* ADAPT WANT1*/
UNION /* <---- ADD THIS */
SELECT "Q2" AS question, COUNT(*) AS count FROM want2 WHERE var4='y' /* ADAPT WANT2*/
UNION
/*...*/
;
QUIT;
Is the WHERE condition always based on the first variable in the sort nodupkey?
Anyway, your code can be optimized:
%let question=Q1;
%let varlist=var1 var2 var3;
%let value=xy;
proc sort
data=mydatabase (
keep=&varlist
where=(%scan(&varlist,1) = "&value")
)
out=sorted
nodupkey
;
by &varlist;
run;
data single_result;
question = "&question";
count = nobs;
set sorted nobs=nobs;
stop;
keep question count;
run;
proc append data=single_result base=want force;
run;
If my assumptions were correct, this can form the basis of a macro with three parameters. Create a dataset with those three parameters, so you can run the macro in an automated way. Before running that, initialize the want dataset.
Hello Kurt,
actually no. The where-condition is random and depends entirely on what the specific question in the questionnaire asks for. It can contain one or several variables.
"Any kind of code" parameters in macros are quite tricky to handle. In this case I suggest you use call execute (or dosubl) from within a data step to create the code.
But the basic structure of my code is still valid: use the where condition and keep= in the first step that reads data and does the sort, and then get the number of observations.
Note that my code overwrites the temporary dataset single_result in every iteration and appends it to the final table, so you'll only have two datasets in the end, one of which can safely be discarded.
Thanks! I'll have to go through your answer carfully as I am still a newbie in SAS.
A short follow-up question: Does it make any difference in terms of performance whether I use a macro or if I just repeat all the code for each question? I mean if it's just a question of readability I dont really care because I am creating the SAS-code automatically using a PHP-script.
Creating the code from a php script will probably mean you write a program, run it in batch, rewrite it, run it again, and so on. Why not use the tools that SAS provides for creating code dynamically?
The following SQL construct can be used to insert (i.e. append) a query result into an existing table
INSERT INTO <TABLE> SELECT ... ;
Example:
* blank base table for results;
proc sql; create table work.results (Question CHAR(20), Result NUM); * Dynamic 1 .... ; data dynamic1; set sashelp.class; run;
* Append query results for Q1 to results table; proc sql; insert into results select 'Q1', count(*) from dynamic1; * Dynamic 2 .... ; data dynamic2; set sashelp.class; where age < 14; run; * Append query results for Q2 to results table;
proc sql; insert into results select 'Q2', count(*) from dynamic2; %let syslast = results;
This looks a lot like skip pattern verification. If your current data is one record per respondent I would use a data step instead of hundreds of queries in SQL.
My basic approach when I have done this is to create a bunch of indicator variables 1 (for correct/desired/expected value of the variable with conditions) and 0 else. Then use a summary or report step to sum the 1 values to get one output table.
More dummy /pseudo code:
data want; set have; Q1 = (var1 ='xy'); Q2 = (var4 ='y'); /* skip pattern check: expect some value of Var5 when var3='y' */ Q3 = (not missing(var5) and var3='y'); run; proc summary data=want ; var Q1 - Q3; output out=work.summary sum=; /* a mean would be percent of records with the response*/ run; /* or a report procedure*/ proc tabulate data=want; var Q1 -Q3; table Q1 -Q3, sum ; run;
Save $250 on SAS Innovate and get a free advance copy of the new SAS For Dummies book! Use the code "SASforDummies" to register. Don't miss out, May 6-9, in Orlando, Florida.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.