BookmarkSubscribeRSS Feed
nicnad
Fluorite | Level 6

Hi,

I am using SAS enterprise guide 4.1 which is running SAS 9.1.

I have the following SQL queries, which counts the number of duplicates for a specific variables within a specific table :

proc sql;

SELECT name, COUNT(name) AS NumOfccurrences FROM work.test1 GROUP BY name HAVING ( COUNT(name) > 1 );

SELECT name, COUNT(name) AS NumOfccurrences FROM work.test2 GROUP BY name HAVING ( COUNT(name) > 1 );

SELECT last_name, COUNT(last_name) AS NumOfccurrences FROM work.test3 GROUP BY last_name HAVING ( COUNT(last_name) > 1 );

SELECT first_name, COUNT(first_name) AS NumOfccurrences FROM work.test4 GROUP BY first_name HAVING ( COUNT(first_name) > 1 );

SELECT name, COUNT(name) AS NumOfccurrences FROM work.test5 GROUP BY name HAVING ( COUNT(name) > 1 );

SELECT name, COUNT(name) AS NumOfccurrences FROM work.test6 GROUP BY name HAVING ( COUNT(name) > 1 );

SELECT name, COUNT(name) AS NumOfccurrences FROM work.test7 GROUP BY name HAVING ( COUNT(name) > 1 );

SELECT name, COUNT(name) AS NumOfccurrences FROM work.test8 GROUP BY name HAVING ( COUNT(name) > 1 );

SELECT name, COUNT(name) AS NumOfccurrences FROM work.test9 GROUP BY name HAVING ( COUNT(name) > 1 );

SELECT name, COUNT(name) AS NumOfccurrences FROM work.test10 GROUP BY name HAVING ( COUNT(name) > 1 );

SELECT name, COUNT(name) AS NumOfccurrences FROM work.test11 GROUP BY name HAVING ( COUNT(name) > 1 );

SELECT name, COUNT(name) AS NumOfccurrences FROM work.test12 GROUP BY name HAVING ( COUNT(name) > 1 );

SELECT name, COUNT(name) AS NumOfccurrences FROM work.test13 GROUP BY name HAVING ( COUNT(name) > 1 );

SELECT name, COUNT(name) AS NumOfccurrences FROM work.test14 GROUP BY name HAVING ( COUNT(name) > 1 );

SELECT name, COUNT(name) AS NumOfccurrences FROM work.test15 GROUP BY name HAVING ( COUNT(name) > 1 )

;quit;

Instead of having an HTML output with 15 different tables, is there an easy way to have an output (html or sas table) where all  those queries would be grouped by table name and variable?

Output should look like table below,

Thank you for your help and time!

work.test1namelast_namefirst_name
work.test2
work.test3
work.test4
work.test5
work.test6
work.test7
work.test8
work.test9
work.test10
work.test11
work.test12
work.test13
work.test14
work.test15
16 REPLIES 16
tish
Calcite | Level 5

I have not tested this... Note the use of double quotes around the value being put into the table_name variable. This is so that the macro variable &i can resolve.

%MACRO count_duplicates;

   %do i = 1 %to 15;

  

      %if &i = 1 %then

         %do;

            create table summary as

         %end;

      %else

         %do;

            insert into summary

         %end;

        

      SELECT

         "work.test&i" as table_name,

         name,

         last_name,

         first_name,

         COUNT(name) AS NumOfccurrences

      FROM

         work.test&i

      GROUP BY

         name

      HAVING

         COUNT(name) > 1;

   %end;

%MEND;

proc sql;

   %count_duplicates;

  

   select *

   from summary

   order by table_name;

quit;

nicnad
Fluorite | Level 6

Thank you for the quick reply.

The only thing is work.test1, work.test2, work.test3, etc... were example name. The real table have alphanumerical value only. How do I add the table name in some kind of array for the macro to loop through them?

Take as an example work.aaa work.bcd work.ddaa

Thank you for your help!

robby_beum
Quartz | Level 8

This is untested as well but might be helpful in looking for other options to accomplish the task:

%macro addname(name);
   data work.want;
      format table $12.;
      set &name;
      table=&name;
   run;
%mend;

%addname(work.test1);
%addname(work.test2);   
%addname(work.test3);   
%addname(work.test4);   
%addname(work.test5);   
%addname(work.test6);   
%addname(work.test7);   
%addname(work.test8);   
%addname(work.test9);   
%addname(work.test10);   
%addname(work.test11);   
%addname(work.test12);   
%addname(work.test13);   
%addname(work.test14);   
%addname(work.test15);

proc sql;
  SELECT
     table as table_name,
     name,
     last_name,
     first_name,
     COUNT(name) AS NumOfccurrences
  FROM work.want
  GROUP BY name
  HAVING
  COUNT(name) > 1;
quit;

nicnad
Fluorite | Level 6

Thank you for the quick reply.

My real table are located in a library named SASUSER

I receive the following error when I try to run your code accordingly. Can you please help me fix that error?

me of your options or statements may not be supported with the Activex or Java series of devices.  Graph defaults for these

      drivers may be different from other SAS/GRAPH device drivers.  For further information, please contact Technical Support.

9          OPTIONS DEV=ACTIVEX;

10         FILENAME EGHTML TEMP;

NOTE: Writing HTML(EGHTML) Body file: EGHTML

11         ODS HTML(ID=EGHTML) FILE=EGHTML ENCODING='utf-8' STYLE=EGDefault

11       ! STYLESHEET=(URL="file:///C:/Program%20Files/SAS/Shared%20Files/BIClientStyles/EGDefault.css")

11       ! ATTRIBUTES=("CODEBASE"="http://www2.sas.com/codebase/graph/v91/sasgraph.exe") NOGTITLE NOGFOOTNOTE GPATH=&sasworklocation

11       ! ;

12        

13         %gaccessible;

14         %macro addname(name);

15            data work.want;

16               format table $12.;

17               set &name;

18               table=&name;

19            run;

20         %mend;

21        

22        

23         %addname(sasuser.test1);

NOTE: DATA statement used (Total process time):

      real time           0.03 seconds

      cpu time            0.00 seconds

     

ERROR: DATA STEP Component Object failure.  Aborted during the COMPILATION phase.

NOTE: The SAS System stopped processing this step because of errors.

NOTE: Line generated by the macro variable "NAME".

23         sasuser.test1

           __________________

           557

ERROR 557-185: Variable sasuser is not an object.


robby_beum
Quartz | Level 8

Try this:

%macro addname(name);
   data sasuser.want;
      format table $12.;
      set sasuser.&name.;
      table=&name;
   run;
%mend;

%addname(test1);
%addname(test2);   
%addname(test3);   
%addname(test4);

...   

nicnad
Fluorite | Level 6

Again, thank you very much for the quick reply.

Maybe I did not illustrate my problem properly, but I am getting this error on the following line :

proc sql;
  SELECT
     table as table_name,
     name,
     last_name,
     first_name,
     COUNT(name) AS NumOfccurrences
  FROM work.want
  GROUP BY name
  HAVING
  COUNT(name) > 1;

ERROR: The following columns were not found in the contributing tables: last_name, first_name

The thing is my tables will either have a column named "name" or "last_name" or "first_name", they won't have the 3 columns all at once.

If I am understanding the code correctly, the error is provocked because the code is trying to select a variable that does not exist in the table.

So how do you add a piece of code that would select the "name" column if it exist or the "last_name" column if it exist or "first_name" if it exist and then count the duplicates on the existing column?

Your help and time are really appreciated.

@ tish : I will test your code and provide you with feedback shortly. Also thanks for the explanation within the code, it really helps me understand the code.

tish
Calcite | Level 5

Okay. Here is the macro, now slightly changed:

%MACRO count_duplicates;

   %do i = 1 %to &table_number;

      %let table_to_use = %scan(&table_list, &i);

      %if &i = 1 %then

         %do;

            create table summary as

         %end;

      %else

         %do;

            insert into summary

         %end;

      SELECT

         "&table_to_use" as table_name,

         name,

         last_name,

         first_name,

         COUNT(name) AS NumOfccurrences

      FROM

         work.&table_to_use

      GROUP BY

         name

      HAVING

         COUNT(name) > 1;

   %end;

%MEND;

Here are two ways to create a list of tables and put it into the macro variable called table_list:

proc sql noprint;

   select

      memname into :table_list separated by ' '

   from

      dictionary.tables

   where

      libname = 'WORK' and

      substr(memname, 1, 6) = "USE_ME";

quit;

        

%let table_list = aaa bcd daa;

Either way, you also need to know how many elements are in this list:

%let table_number = %eval(%sysfunc(count(&table_list, %str( )))+1);

Then, call the macro:

proc sql;

   %count_duplicates;

   select *

   from summary

   order by table_name;

quit;

I use this technique a bit. If you go the route of using the dictionary tables, I think you need to invoke proc sql twice: first to grab the list and then to use it.

nicnad
Fluorite | Level 6

See the reply above. I did not illustrate my problem correctly and I am really sorry for that. I get the same error :

 

ERROR: The following columns were not found in the contributing tables:

The thing is my tables will either have a column named "name" or "last_name" or "first_name", they won't have the 3 columns all at once. Can you please help me fix the code accordingly?

tish
Calcite | Level 5

Oh, now I see. Some of these tables have name, others have last_name or first_name. If you are planning to hand build the macro variable table_list, I'd hand build a parallel list as another macro variable. Lets call it name_variable:

%let table_list = aaa bbb ccc ddd;

%let name_variable = name name name last_name;

Make sure that the name in name_variable at position 1 is really the name of the name variable in the table listed at position 1 in the table_list.

Then, in the macro loop, right after %let table_to_use = %scan(&table_list, &i); I'd add:

%let variable_to_use = %scan(&name_variable, &i);

And then for the body of that sql monster up there, I'd use:

      SELECT

         "&table_to_use" as table_name,

          %if &variable_to_use = name %then

               %do;

         name as combined_name,

               %end;

          %else %if &variable_to_use = last_name %then

          

%do;

     last_name as combined_name,

%end;

%else

   %do;

                  first_name as combined_name,

               %end;

         COUNT(calculated combined_name) AS NumOfccurrences

      FROM

         work.&table_to_use

      GROUP BY

         calculated combined_name

      HAVING

         COUNT(calculated combined_name) > 1;

I can't test this one out. If sql doesn't like the references to the calculated variable "combined_name" in the group or having clauses, I'd use another %if-%then-%else block of macro code to substitute the real variable names for sql to use.

I do hope this helps!

tish
Calcite | Level 5

If you want to go the dictionary table route, you will need to build a variable_to_use list again that's parallel to the table_list. I'd look at the dictionary.columns table and maybe scan for column names using the like operator (is it an operator?) for the string name or NAME.

nicnad
Fluorite | Level 6

Hi,

I get the following errors when running your code. Can you please help me fix it? :

76          proc sql;

77        

78               SELECT

79        

80                  "&table_to_use" as table_name,

WARNING: Apparent symbolic reference TABLE_TO_USE not resolved.

81        

82                   %if &variable_to_use = first_name  %then

ERROR: The %IF statement is not valid in open code.

Thank you for your help!

tish
Calcite | Level 5

You have to run this code in the context of the earlier macro. It loops through the table list to find the table to use. You cannot run it in open code.

Take what I wrote and substitute it into the earlier sql code, right after the part where you conditionally create or insert data into a table.

I hope this helps. This is one way to do it. Howles (response 11) has another.

nicnad
Fluorite | Level 6

Thank you for the reply.

I tried hard to make your code work but I am unable to. Could you please rewrite the entire macro and sql procedure?

Thank you for your help and time

BigD
Calcite | Level 5

I do this sort of thing with proc append in the macro loop

%macro test;

  %do i=1 %to &sqlobs;


PROC IMPORT

     datafile="C:\_2010_11_AAH_Q4\%scan(&fn.,&i,'+')"

     OUT= work.test&i

     DBMS=EXCEL REPLACE;

     SHEET="'Form4$'";

     range='a5:u120';

     GETNAMES=NO;

     MIXED=NO;

     SCANTEXT=YES;

     USEDATE=YES;

     SCANTIME=YES;

    run;    

/*Grab the source file name and insert into data table*/

data work.vv&i; set test&i;

informat prj $20.;format prj $20.;

prj  = "%scan(&f.,&i,'+')";

run;


/*put all the data sets together */


proc append base=all_combined data=WORK.vv&i force;


run;


%end;


%mend;


%test;


run;


%end;



%mend;

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 16 replies
  • 1267 views
  • 0 likes
  • 5 in conversation