BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Ronein
Meteorite | Level 14

Hello

There is a program that I need to run every few weeks.

This program need to use different source data sets (By business decision).

For example:

Let's say that If I run the program today then I need to use data sets: t2001,t2002,t2003.

In the program there is a step that check distinct values for each data source and then stack the values (Union) and delete duplicates values (Union all).

My question:

Is there a shorter way to perform the dynamic process of creating data set Newtbl?

 

 

 

Data t2001;
input ToCustomer FromCustomer Y
cards;
111 999 10
222 888 20
333 888 30
;
Run;

Data t2002;
input ToCustomer FromCustomer Y
cards;
111 999 10
333 888 30
444 767 40
;
Run;

Data t2003;
input ToCustomer FromCustomer Y
cards;
111 999 10
555 345 30
;
Run;

/*Non dynamic way to create data set NewTbl*/ PROC SQL; Create table NewTbl SELECT distinct ToCustomer, FromCustomer FROM t2001 Union All SELECT distinct ToCustomer, FromCustomer FROM t2002 Union All SELECT distinct ToCustomer, FromCustomer FROM t2003 ; QUIT;


/*dynamic way to create data set NewTbl*/


%macro RRR1 (YYMM);
create table distinct_t_&YYMM. as
select distinct ToCustomer, FromCustomer
FROM t&YYMM.
;
quit;
%mend RRR1;

%let vector=2001+2002+2003;
%let k = %sysfunc(countw(&vector));

%macro RUN_RRR;
%do j=1 %to &k.;
%let YYMM=%scan(&vector.,&j.,+);
%RRR(&YYMM.);
%end;
%mend RUN_RRR;
%RUN_RRR;

data NewTbl_;
SET distinct_t_:;
Run;

proc sort data=NewTbl_ nodupkey;
by ToCustomer FromCustomer;
Run;

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Shmuel
Garnet | Level 18

Try next code:

data all_t / view=all_t;
   set t2001 t2003 t2003;
  /* OR set t20: ;  <<< for concatenating all t20xx datasets */
run;
proc sort data=all_t out=final nodupkey;
  by ToCustomer, FromCustomer;
run;

View solution in original post

10 REPLIES 10
Shmuel
Garnet | Level 18

Try next code:

data all_t / view=all_t;
   set t2001 t2003 t2003;
  /* OR set t20: ;  <<< for concatenating all t20xx datasets */
run;
proc sort data=all_t out=final nodupkey;
  by ToCustomer, FromCustomer;
run;
Ronein
Meteorite | Level 14

Thanks,

What is the meaning of this code that you wrote please?

/ view=akk_t;
Shmuel
Garnet | Level 18

@Ronein wrote:

Thanks,

What is the meaning of this code that you wrote please?

/ view=akk_t;

A view is a sas object that tells sas what to do without doing it immediately.

In this case it tells sas to concatenate the input datasets to create one input stream to the sort procedure just before running the sort. Mainly, it saves run time ans disk space.

Note: Concatenation of datasets means read all observation of the n datasets and write them to one output dataset. By using a view, the observation are read once directly into the sort procedure.

 

You can create a view by sas step as shown up or even by a sql by:

     

proc sql ...;
       create VIEW <view_name>  as
        ......
; quit;

 

Ronein
Meteorite | Level 14

Thank you!

So as I understand this statement   create a view instead of data set and advantage is program efficiency

data all_t / view=all_t;

 

Thank you

Patrick
Opal | Level 21

If you want to stick with a SQL Union then below should do the job. Using a SQL Union would be advantageous if your source data resides in a data base; or also if you can't be certain that the variable lengths in your source tables are always the same.

In regards of the UNION operator: Without the keyword ALL a Union will dedup the result set so for what you want don't use a combination of Distinct/Union All but just use a Union. That should give you the same result.

 

And here the code:

/* source data */
Data t2001;
  input ToCustomer FromCustomer Y;
  cards;
111 999 10
222 888 20
333 888 30
;

Data t2002;
  input ToCustomer FromCustomer Y;
  cards;
111 999 10
333 888 30
444 767 40
;

Data t2003;
  input ToCustomer FromCustomer Y;
  cards;
111 999 10
555 345 30
;

/* macro definition */
%macro createWant(sources, target);
  proc sql;
    create table &target as
      select *
        from %scan(&sources,1)

      %do i=2 %to %sysfunc(countw(&sources,%str( )));
        union
        select *
          from %scan(&sources,&i,%str( ))
      %end;
    ;
  quit;
%mend;

/* call macro to create output table using list of defined source tables */
options mprint;
%createWant(t2001 work.t2002 t2003,want);
options nomprint;

proc print data=want;
run;
Ronein
Meteorite | Level 14

Thanks, you didn't use distinct....

Shmuel
Garnet | Level 18

I have used SORT ... NODUPKEY which is equivalent to SQL .. DISTINCT 

in your case.

Patrick
Opal | Level 21

@Ronein wrote:

Thanks, you didn't use distinct....


That's because a Distinct is not required. The UNION operator without the ALL keyword will only return unique rows. May be have a read here.

A Select DISTINCT * together with UNION ALL would return distinct rows per source table but would not dedup rows from different source tables. Looking at your sample source data I felt that's not what you want.

The Proc Sort Nodupkey approach will also only return unique rows and though give the same result than the UNION (without Distinct and ALL keywords).

 

 

Ronein
Meteorite | Level 14
What is the difference between running the code:
options mprint;
%createWant(t2001 work.t2002 t2003,want);
options nomprint;

and running only the code:
%createWant(t2001 work.t2002 t2003,want);

Shmuel
Garnet | Level 18

@Ronein wrote:
What is the difference between running the code:
options mprint;
%createWant(t2001 work.t2002 t2003,want);
options nomprint;

and running only the code:
%createWant(t2001 work.t2002 t2003,want);


Best answer you will got if you run:

First:

options mprint;
%createWant(t2001 work.t2002 t2003,want);

Then run and compare logs:

options nomprint;
%createWant(t2001 work.t2002 t2003,want);

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 10 replies
  • 1293 views
  • 2 likes
  • 3 in conversation