Hi,
I am using SAS enterprise guide 4.1 which is running SAS 9.1.
I have the following SQL queries, which counts the number of duplicates for a specific variables within a specific table :
proc sql;
SELECT name, COUNT(name) AS NumOfccurrences FROM work.test1 GROUP BY name HAVING ( COUNT(name) > 1 );
SELECT name, COUNT(name) AS NumOfccurrences FROM work.test2 GROUP BY name HAVING ( COUNT(name) > 1 );
SELECT last_name, COUNT(last_name) AS NumOfccurrences FROM work.test3 GROUP BY last_name HAVING ( COUNT(last_name) > 1 );
SELECT first_name, COUNT(first_name) AS NumOfccurrences FROM work.test4 GROUP BY first_name HAVING ( COUNT(first_name) > 1 );
SELECT name, COUNT(name) AS NumOfccurrences FROM work.test5 GROUP BY name HAVING ( COUNT(name) > 1 );
SELECT name, COUNT(name) AS NumOfccurrences FROM work.test6 GROUP BY name HAVING ( COUNT(name) > 1 );
SELECT name, COUNT(name) AS NumOfccurrences FROM work.test7 GROUP BY name HAVING ( COUNT(name) > 1 );
SELECT name, COUNT(name) AS NumOfccurrences FROM work.test8 GROUP BY name HAVING ( COUNT(name) > 1 );
SELECT name, COUNT(name) AS NumOfccurrences FROM work.test9 GROUP BY name HAVING ( COUNT(name) > 1 );
SELECT name, COUNT(name) AS NumOfccurrences FROM work.test10 GROUP BY name HAVING ( COUNT(name) > 1 );
SELECT name, COUNT(name) AS NumOfccurrences FROM work.test11 GROUP BY name HAVING ( COUNT(name) > 1 );
SELECT name, COUNT(name) AS NumOfccurrences FROM work.test12 GROUP BY name HAVING ( COUNT(name) > 1 );
SELECT name, COUNT(name) AS NumOfccurrences FROM work.test13 GROUP BY name HAVING ( COUNT(name) > 1 );
SELECT name, COUNT(name) AS NumOfccurrences FROM work.test14 GROUP BY name HAVING ( COUNT(name) > 1 );
SELECT name, COUNT(name) AS NumOfccurrences FROM work.test15 GROUP BY name HAVING ( COUNT(name) > 1 )
;quit;
Instead of having an HTML output with 15 different tables, is there an easy way to have an output (html or sas table) where all those queries would be grouped by table name and variable?
Output should look like table below,
Thank you for your help and time!
work.test1 | name | last_name | first_name |
work.test2 | |||
work.test3 | |||
work.test4 | |||
work.test5 | |||
work.test6 | |||
work.test7 | |||
work.test8 | |||
work.test9 | |||
work.test10 | |||
work.test11 | |||
work.test12 | |||
work.test13 | |||
work.test14 | |||
work.test15 |
I have not tested this... Note the use of double quotes around the value being put into the table_name variable. This is so that the macro variable &i can resolve.
%MACRO count_duplicates;
%do i = 1 %to 15;
%if &i = 1 %then
%do;
create table summary as
%end;
%else
%do;
insert into summary
%end;
SELECT
"work.test&i" as table_name,
name,
last_name,
first_name,
COUNT(name) AS NumOfccurrences
FROM
work.test&i
GROUP BY
name
HAVING
COUNT(name) > 1;
%end;
%MEND;
proc sql;
%count_duplicates;
select *
from summary
order by table_name;
quit;
Thank you for the quick reply.
The only thing is work.test1, work.test2, work.test3, etc... were example name. The real table have alphanumerical value only. How do I add the table name in some kind of array for the macro to loop through them?
Take as an example work.aaa work.bcd work.ddaa
Thank you for your help!
This is untested as well but might be helpful in looking for other options to accomplish the task:
%macro addname(name);
data work.want;
format table $12.;
set &name;
table=&name;
run;
%mend;
%addname(work.test1);
%addname(work.test2);
%addname(work.test3);
%addname(work.test4);
%addname(work.test5);
%addname(work.test6);
%addname(work.test7);
%addname(work.test8);
%addname(work.test9);
%addname(work.test10);
%addname(work.test11);
%addname(work.test12);
%addname(work.test13);
%addname(work.test14);
%addname(work.test15);
proc sql;
SELECT
table as table_name,
name,
last_name,
first_name,
COUNT(name) AS NumOfccurrences
FROM work.want
GROUP BY name
HAVING
COUNT(name) > 1;
quit;
Thank you for the quick reply.
My real table are located in a library named SASUSER
I receive the following error when I try to run your code accordingly. Can you please help me fix that error?
me of your options or statements may not be supported with the Activex or Java series of devices. Graph defaults for these
drivers may be different from other SAS/GRAPH device drivers. For further information, please contact Technical Support.
9 OPTIONS DEV=ACTIVEX;
10 FILENAME EGHTML TEMP;
NOTE: Writing HTML(EGHTML) Body file: EGHTML
11 ODS HTML(ID=EGHTML) FILE=EGHTML ENCODING='utf-8' STYLE=EGDefault
11 ! STYLESHEET=(URL="file:///C:/Program%20Files/SAS/Shared%20Files/BIClientStyles/EGDefault.css")
11 ! ATTRIBUTES=("CODEBASE"="http://www2.sas.com/codebase/graph/v91/sasgraph.exe") NOGTITLE NOGFOOTNOTE GPATH=&sasworklocation
11 ! ;
12
13 %gaccessible;
14 %macro addname(name);
15 data work.want;
16 format table $12.;
17 set &name;
18 table=&name;
19 run;
20 %mend;
21
22
23 %addname(sasuser.test1);
NOTE: DATA statement used (Total process time):
real time 0.03 seconds
cpu time 0.00 seconds
ERROR: DATA STEP Component Object failure. Aborted during the COMPILATION phase.
NOTE: The SAS System stopped processing this step because of errors.
NOTE: Line generated by the macro variable "NAME".
23 sasuser.test1
__________________
557
ERROR 557-185: Variable sasuser is not an object.
Try this:
%macro addname(name);
data sasuser.want;
format table $12.;
set sasuser.&name.;
table=&name;
run;
%mend;
%addname(test1);
%addname(test2);
%addname(test3);
%addname(test4);
...
Again, thank you very much for the quick reply.
Maybe I did not illustrate my problem properly, but I am getting this error on the following line :
proc sql;
SELECT
table as table_name,
name,
last_name,
first_name,
COUNT(name) AS NumOfccurrences
FROM work.want
GROUP BY name
HAVING
COUNT(name) > 1;
ERROR: The following columns were not found in the contributing tables: last_name, first_name
The thing is my tables will either have a column named "name" or "last_name" or "first_name", they won't have the 3 columns all at once.
If I am understanding the code correctly, the error is provocked because the code is trying to select a variable that does not exist in the table.
So how do you add a piece of code that would select the "name" column if it exist or the "last_name" column if it exist or "first_name" if it exist and then count the duplicates on the existing column?
Your help and time are really appreciated.
@ tish : I will test your code and provide you with feedback shortly. Also thanks for the explanation within the code, it really helps me understand the code.
Okay. Here is the macro, now slightly changed:
%MACRO count_duplicates;
%do i = 1 %to &table_number;
%let table_to_use = %scan(&table_list, &i);
%if &i = 1 %then
%do;
create table summary as
%end;
%else
%do;
insert into summary
%end;
SELECT
"&table_to_use" as table_name,
name,
last_name,
first_name,
COUNT(name) AS NumOfccurrences
FROM
work.&table_to_use
GROUP BY
name
HAVING
COUNT(name) > 1;
%end;
%MEND;
Here are two ways to create a list of tables and put it into the macro variable called table_list:
proc sql noprint;
select
memname into :table_list separated by ' '
from
dictionary.tables
where
libname = 'WORK' and
substr(memname, 1, 6) = "USE_ME";
quit;
%let table_list = aaa bcd daa;
Either way, you also need to know how many elements are in this list:
%let table_number = %eval(%sysfunc(count(&table_list, %str( )))+1);
Then, call the macro:
proc sql;
%count_duplicates;
select *
from summary
order by table_name;
quit;
I use this technique a bit. If you go the route of using the dictionary tables, I think you need to invoke proc sql twice: first to grab the list and then to use it.
See the reply above. I did not illustrate my problem correctly and I am really sorry for that. I get the same error :
ERROR: The following columns were not found in the contributing tables:
The thing is my tables will either have a column named "name" or "last_name" or "first_name", they won't have the 3 columns all at once. Can you please help me fix the code accordingly?
Oh, now I see. Some of these tables have name, others have last_name or first_name. If you are planning to hand build the macro variable table_list, I'd hand build a parallel list as another macro variable. Lets call it name_variable:
%let table_list = aaa bbb ccc ddd;
%let name_variable = name name name last_name;
Make sure that the name in name_variable at position 1 is really the name of the name variable in the table listed at position 1 in the table_list.
Then, in the macro loop, right after %let table_to_use = %scan(&table_list, &i); I'd add:
%let variable_to_use = %scan(&name_variable, &i);
And then for the body of that sql monster up there, I'd use:
SELECT
"&table_to_use" as table_name,
%if &variable_to_use = name %then
%do;
name as combined_name,
%end;
%else %if &variable_to_use = last_name %then
%do; last_name as combined_name, %end; %else %do; |
first_name as combined_name,
%end;
COUNT(calculated combined_name) AS NumOfccurrences
FROM
work.&table_to_use
GROUP BY
calculated combined_name
HAVING
COUNT(calculated combined_name) > 1;
I can't test this one out. If sql doesn't like the references to the calculated variable "combined_name" in the group or having clauses, I'd use another %if-%then-%else block of macro code to substitute the real variable names for sql to use.
I do hope this helps!
If you want to go the dictionary table route, you will need to build a variable_to_use list again that's parallel to the table_list. I'd look at the dictionary.columns table and maybe scan for column names using the like operator (is it an operator?) for the string name or NAME.
Hi,
I get the following errors when running your code. Can you please help me fix it? :
76 proc sql;
77
78 SELECT
79
80 "&table_to_use" as table_name,
WARNING: Apparent symbolic reference TABLE_TO_USE not resolved.
81
82 %if &variable_to_use = first_name %then
ERROR: The %IF statement is not valid in open code.
Thank you for your help!
You have to run this code in the context of the earlier macro. It loops through the table list to find the table to use. You cannot run it in open code.
Take what I wrote and substitute it into the earlier sql code, right after the part where you conditionally create or insert data into a table.
I hope this helps. This is one way to do it. Howles (response 11) has another.
Thank you for the reply.
I tried hard to make your code work but I am unable to. Could you please rewrite the entire macro and sql procedure?
Thank you for your help and time
I do this sort of thing with proc append in the macro loop
%macro test;
%do i=1 %to &sqlobs;
PROC IMPORT
datafile="C:\_2010_11_AAH_Q4\%scan(&fn.,&i,'+')"
OUT= work.test&i
DBMS=EXCEL REPLACE;
SHEET="'Form4$'";
range='a5:u120';
GETNAMES=NO;
MIXED=NO;
SCANTEXT=YES;
USEDATE=YES;
SCANTIME=YES;
run;
/*Grab the source file name and insert into data table*/
data work.vv&i; set test&i;
informat prj $20.;format prj $20.;
prj = "%scan(&f.,&i,'+')";
run;
/*put all the data sets together */
proc append base=all_combined data=WORK.vv&i force;
run;
%end;
%mend;
%test;
run;
%end;
%mend;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.