Hello
There is a program that I need to run every few weeks.
This program need to use different source data sets (By business decision).
For example:
Let's say that If I run the program today then I need to use data sets: t2001,t2002,t2003.
In the program there is a step that check distinct values for each data source and then stack the values (Union) and delete duplicates values (Union all).
My question:
Is there a shorter way to perform the dynamic process of creating data set Newtbl?
Data t2001;
input ToCustomer FromCustomer Y
cards;
111 999 10
222 888 20
333 888 30
;
Run;
Data t2002;
input ToCustomer FromCustomer Y
cards;
111 999 10
333 888 30
444 767 40
;
Run;
Data t2003;
input ToCustomer FromCustomer Y
cards;
111 999 10
555 345 30
;
Run;
/*Non dynamic way to create data set NewTbl*/
PROC SQL;
Create table NewTbl
SELECT distinct ToCustomer, FromCustomer
FROM t2001
Union All
SELECT distinct ToCustomer, FromCustomer
FROM t2002
Union All
SELECT distinct ToCustomer, FromCustomer
FROM t2003
;
QUIT;
/*dynamic way to create data set NewTbl*/
%macro RRR1 (YYMM);
create table distinct_t_&YYMM. as
select distinct ToCustomer, FromCustomer
FROM t&YYMM.
;
quit;
%mend RRR1;
%let vector=2001+2002+2003;
%let k = %sysfunc(countw(&vector));
%macro RUN_RRR;
%do j=1 %to &k.;
%let YYMM=%scan(&vector.,&j.,+);
%RRR(&YYMM.);
%end;
%mend RUN_RRR;
%RUN_RRR;
data NewTbl_;
SET distinct_t_:;
Run;
proc sort data=NewTbl_ nodupkey;
by ToCustomer FromCustomer;
Run;
Try next code:
data all_t / view=all_t;
set t2001 t2003 t2003;
/* OR set t20: ; <<< for concatenating all t20xx datasets */
run;
proc sort data=all_t out=final nodupkey;
by ToCustomer, FromCustomer;
run;
Try next code:
data all_t / view=all_t;
set t2001 t2003 t2003;
/* OR set t20: ; <<< for concatenating all t20xx datasets */
run;
proc sort data=all_t out=final nodupkey;
by ToCustomer, FromCustomer;
run;
Thanks,
What is the meaning of this code that you wrote please?
/ view=akk_t;
@Ronein wrote:
Thanks,
What is the meaning of this code that you wrote please?
/ view=akk_t;
A view is a sas object that tells sas what to do without doing it immediately.
In this case it tells sas to concatenate the input datasets to create one input stream to the sort procedure just before running the sort. Mainly, it saves run time ans disk space.
Note: Concatenation of datasets means read all observation of the n datasets and write them to one output dataset. By using a view, the observation are read once directly into the sort procedure.
You can create a view by sas step as shown up or even by a sql by:
proc sql ...;
create VIEW <view_name> as
......
; quit;
Thank you!
So as I understand this statement create a view instead of data set and advantage is program efficiency
data all_t / view=all_t;
Thank you
If you want to stick with a SQL Union then below should do the job. Using a SQL Union would be advantageous if your source data resides in a data base; or also if you can't be certain that the variable lengths in your source tables are always the same.
In regards of the UNION operator: Without the keyword ALL a Union will dedup the result set so for what you want don't use a combination of Distinct/Union All but just use a Union. That should give you the same result.
And here the code:
/* source data */
Data t2001;
input ToCustomer FromCustomer Y;
cards;
111 999 10
222 888 20
333 888 30
;
Data t2002;
input ToCustomer FromCustomer Y;
cards;
111 999 10
333 888 30
444 767 40
;
Data t2003;
input ToCustomer FromCustomer Y;
cards;
111 999 10
555 345 30
;
/* macro definition */
%macro createWant(sources, target);
proc sql;
create table &target as
select *
from %scan(&sources,1)
%do i=2 %to %sysfunc(countw(&sources,%str( )));
union
select *
from %scan(&sources,&i,%str( ))
%end;
;
quit;
%mend;
/* call macro to create output table using list of defined source tables */
options mprint;
%createWant(t2001 work.t2002 t2003,want);
options nomprint;
proc print data=want;
run;
Thanks, you didn't use distinct....
I have used SORT ... NODUPKEY which is equivalent to SQL .. DISTINCT
in your case.
@Ronein wrote:
Thanks, you didn't use distinct....
That's because a Distinct is not required. The UNION operator without the ALL keyword will only return unique rows. May be have a read here.
A Select DISTINCT * together with UNION ALL would return distinct rows per source table but would not dedup rows from different source tables. Looking at your sample source data I felt that's not what you want.
The Proc Sort Nodupkey approach will also only return unique rows and though give the same result than the UNION (without Distinct and ALL keywords).
@Ronein wrote:
What is the difference between running the code:
options mprint;
%createWant(t2001 work.t2002 t2003,want);
options nomprint;
and running only the code:
%createWant(t2001 work.t2002 t2003,want);
Best answer you will got if you run:
First:
options mprint;
%createWant(t2001 work.t2002 t2003,want);
Then run and compare logs:
options nomprint;
%createWant(t2001 work.t2002 t2003,want);
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.