Solved: Dynamic process to union all data sets with distinct values

Ronein · Posted 07-17-2020 02:38 AM

Hello

There is a program that I need to run every few weeks.

This program need to use different source data sets (By business decision).

For example:

Let's say that If I run the program today then I need to use data sets: t2001,t2002,t2003.

In the program there is a step that check distinct values for each data source and then stack the values (Union) and delete duplicates values (Union all).

My question:

Is there a shorter way to perform the dynamic process of creating data set Newtbl?

Data t2001;
input ToCustomer FromCustomer Y
cards;
111 999 10
222 888 20
333 888 30
;
Run;

Data t2002;
input ToCustomer FromCustomer Y
cards;
111 999 10
333 888 30
444 767 40
;
Run;

Data t2003;
input ToCustomer FromCustomer Y
cards;
111 999 10
555 345 30
;
Run;

/*Non dynamic way to create data set NewTbl*/
PROC SQL;
Create table NewTbl
 SELECT distinct ToCustomer, FromCustomer
 FROM t2001
Union All
 SELECT distinct ToCustomer, FromCustomer
 FROM t2002
Union All
 SELECT distinct ToCustomer, FromCustomer
 FROM t2003
;
QUIT;


/*dynamic way to create data set NewTbl*/


%macro RRR1 (YYMM);
create table distinct_t_&YYMM. as
select distinct ToCustomer, FromCustomer
 FROM t&YYMM.
;
quit;
%mend RRR1;

%let vector=2001+2002+2003;
%let k = %sysfunc(countw(&vector));

%macro RUN_RRR; 
%do j=1 %to &k.;
%let YYMM=%scan(&vector.,&j.,+);
%RRR(&YYMM.);
%end;
%mend RUN_RRR;
%RUN_RRR;

data NewTbl_;
SET distinct_t_:;
Run;

proc sort data=NewTbl_  nodupkey;
by ToCustomer FromCustomer;
Run;

Shmuel · Posted 07-17-2020 04:22 AM

Try next code:

data all_t / view=all_t;
   set t2001 t2003 t2003;
  /* OR set t20: ;  <<< for concatenating all t20xx datasets */
run;
proc sort data=all_t out=final nodupkey;
  by ToCustomer, FromCustomer;
run;

View solution in original post

Shmuel · Posted 07-17-2020 04:22 AM

Try next code:

data all_t / view=all_t;
   set t2001 t2003 t2003;
  /* OR set t20: ;  <<< for concatenating all t20xx datasets */
run;
proc sort data=all_t out=final nodupkey;
  by ToCustomer, FromCustomer;
run;

Ronein · Posted 07-17-2020 03:49 PM

Thanks,

What is the meaning of this code that you wrote please?

/ view=akk_t;

Shmuel · Posted 07-17-2020 04:24 PM

@Ronein wrote:

Thanks,

What is the meaning of this code that you wrote please?
/ view=akk_t;

A view is a sas object that tells sas what to do without doing it immediately.

In this case it tells sas to concatenate the input datasets to create one input stream to the sort procedure just before running the sort. Mainly, it saves run time ans disk space.

Note: Concatenation of datasets means read all observation of the n datasets and write them to one output dataset. By using a view, the observation are read once directly into the sort procedure.

You can create a view by sas step as shown up or even by a sql by:

proc sql ...;
       create VIEW <view_name>  as
        ......
; quit;

Ronein · Posted 07-18-2020 03:19 PM

Thank you!

So as I understand this statement create a view instead of data set and advantage is program efficiency

data all_t / view=all_t;

Thank you

Patrick · Posted 07-17-2020 06:28 AM

If you want to stick with a SQL Union then below should do the job. Using a SQL Union would be advantageous if your source data resides in a data base; or also if you can't be certain that the variable lengths in your source tables are always the same.

In regards of the UNION operator: Without the keyword ALL a Union will dedup the result set so for what you want don't use a combination of Distinct/Union All but just use a Union. That should give you the same result.

And here the code:

/* source data */
Data t2001;
  input ToCustomer FromCustomer Y;
  cards;
111 999 10
222 888 20
333 888 30
;

Data t2002;
  input ToCustomer FromCustomer Y;
  cards;
111 999 10
333 888 30
444 767 40
;

Data t2003;
  input ToCustomer FromCustomer Y;
  cards;
111 999 10
555 345 30
;

/* macro definition */
%macro createWant(sources, target);
  proc sql;
    create table &target as
      select *
        from %scan(&sources,1)

      %do i=2 %to %sysfunc(countw(&sources,%str( )));
        union
        select *
          from %scan(&sources,&i,%str( ))
      %end;
    ;
  quit;
%mend;

/* call macro to create output table using list of defined source tables */
options mprint;
%createWant(t2001 work.t2002 t2003,want);
options nomprint;

proc print data=want;
run;

Ronein · Posted 07-17-2020 03:52 PM

Thanks, you didn't use distinct....

Shmuel · Posted 07-17-2020 04:27 PM

I have used SORT ... NODUPKEY which is equivalent to SQL .. DISTINCT

in your case.

Patrick · Posted 07-17-2020 09:57 PM

@Ronein wrote:

Thanks, you didn't use distinct....

That's because a Distinct is not required. The UNION operator without the ALL keyword will only return unique rows. May be have a read here.

A Select DISTINCT * together with UNION ALL would return distinct rows per source table but would not dedup rows from different source tables. Looking at your sample source data I felt that's not what you want.

The Proc Sort Nodupkey approach will also only return unique rows and though give the same result than the UNION (without Distinct and ALL keywords).

Ronein · Posted 07-18-2020 03:43 PM

What is the difference between running the code:
options mprint;
%createWant(t2001 work.t2002 t2003,want);
options nomprint;

and running only the code:
%createWant(t2001 work.t2002 t2003,want);

Shmuel · Posted 07-18-2020 04:25 PM

@Ronein wrote:
What is the difference between running the code:
options mprint;
%createWant(t2001 work.t2002 t2003,want);
options nomprint;

and running only the code:
%createWant(t2001 work.t2002 t2003,want);

Best answer you will got if you run:

First:

options mprint;
%createWant(t2001 work.t2002 t2003,want);

Then run and compare logs:

options nomprint;
%createWant(t2001 work.t2002 t2003,want);

Dynamic process to union all data sets with distinct values

Re: Dynamic process to union all data sets with distinct values

Re: Dynamic process to union all data sets with distinct values

Re: Dynamic process to union all data sets with distinct values

Re: Dynamic process to union all data sets with distinct values

Re: Dynamic process to union all data sets with distinct values

Re: Dynamic process to union all data sets with distinct values

Re: Dynamic process to union all data sets with distinct values

Re: Dynamic process to union all data sets with distinct values

Re: Dynamic process to union all data sets with distinct values

Re: Dynamic process to union all data sets with distinct values

Re: Dynamic process to union all data sets with distinct values

SAS Innovate 2025: Save the Date

SAS Training: Just a Click Away