Merge with mutiple of data sets or merge with one stacked data set?

Ronein · Posted 06-06-2021 03:47 PM

Hello,

I have a data set called Transcation_tbl with following fields: Cusomer_ID, date,YYMM,transaction_ID,Amount.

Date is date of transaction (SAS date).

YYMM is a char in structure YYMM

There are series of data sets called tYYMM with following columns: Cusomer_ID,wealth,Obligation.

The task is to add :wealth,Obligation to data set Transcation_tbl .

There are 2 ways to do it and I want to ask which way is better:

Way1:

Create one data set by stacking (union) data sets tYYMM and add a column called YYMM(Take it from the source data set name).

Then Perform merge Transcation_tbl + Stcked_tbl (Left Join) By Cusomer_ID+YYMM.

Please note that in this way the stacked data set will contain many rows(maybe 200 million rows)

Way2:

Merge Transcation_tbl +multiple data sets tYYMM (Left Join)

Which way is better?

Is there another way more efficient?

Will it save time If I define index for data sets?

yabwon · Posted 06-06-2021 06:17 PM

Do we have any assumptions about Transcation_tbl? Is it sorted (if yes, how?), is it indexed? Is, and if yes how, the YYMM related to date?

How many datasets tYYMM do you have?

Bart

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation

Ronein · Posted 06-10-2021 06:56 AM

Is it sorted (if yes, how?), is it indexed?Yes, It is sorted, No, It has no Index

Is, and if yes how, the YYMM related to date?
How many datasets tYYMM do you have? In real there are 24 data sets of tYYM (each data set contain around 5 million rows)

Bart

Kurt_Bremser · Posted 06-06-2021 06:28 PM

Give us a few usable examples for the TYYMM and transaction_tbl datasets, covering all constellations appearing in your real data, and the result you expect from that.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Sajid01 · Posted 06-07-2021 07:49 AM

Hello @Ronein
1.If your data is in RBMS tables, as much as possible use the native SQL, So SQL Pass through an native SQL syntax.

2.The first approach looks more comprehensible an I would prefer it.
3.Both approaches come with a cost, but the first one will be easy to manage.

Ronein · Posted 06-10-2021 07:10 AM

Hello all, please see current code.

In real life there are many fields to tYYMM data sets .

In real life there are many data sets tYYMM(50 data sets that are produced every month).

Question:

May anyone show other attitudes that are more efficient and explain why they are more efficient?


Data Transcation_tbl;
format date date9.;
Input Cusomer_ID  date :date9. YYMM $  transaction_ID  Amount;
cards;
1 27Apr2021 2103 123456 1000
2 22Mar2021 2101 345677 2000
3 01Jan2021 2012 763372 3000
;
Run;

Data t2012;
input Cusomer_ID wealth Obligation;
cards;
1 100 10
2 200 20
3 100 0
4 400 30
5 200 20
6 500 50
;
run;

Data t2101;
input Cusomer_ID wealth Obligation;
cards;
1 200 20
2 100 30
3 300 10
4 200 0
5 500 0
6 400 25
;
run;


Data t2102;
input Cusomer_ID wealth Obligation;
cards;
1 300 10
3 200 20
6 300 30
;
run;



Data t2103;
input Cusomer_ID wealth Obligation;
cards;
1 200 40
3 200 20
4 500 50
6 300 30
;
run;

PROC SQL noprint;
	select  min(YYMM),max(YYMM) into: minYYMM , :maxYYMM	   
	from  Transcation_tbl
;
QUIT;
%put &minYYMM;
%put &maxYYMM;


data _null_;
  date_start=mdy(mod(&minYYMM,100),1,floor(&minYYMM/100));
  date_end=mdy(mod(&maxYYMM,100),1,floor(&maxYYMM/100));
  format date_start date_end  date9.;
  counter = intck('month',date_start,date_end);
  call symputx('n',put(counter,best.));
  call symputx('date_start',put(date_start,best.));
run;
%put &n.;

%macro months;
%do i=0 %to &n.;
     m&i.=put(intnx('month',&date_start.,&i.),yymmn4.);
     call symputx("m&i",trim(left(m&i.)));
%end;
%mend;

data _null_;
%months;
run;

%put &n;
%put &m0;
%PUT &&m&n..;



%macro ssort;
%do j=0 %to &n.;
proc sort data=t&&m&j..;by Cusomer_ID;Run;
%end;	
%mend ssort;
%ssort;


%macro mmerge;
%do j=0 %to &n.;
t&&m&j..(rename=(wealth=wealth&&m&j.. Obligation=Obligation&&m&j..))
%end;
%mend mmerge;
%put %mmerge;


%macro dropp;
%do j=0 %to &n.;
wealth&&m&j.. 
Obligation&&m&j..
%end;	
%mend;


%macro scorinit;
%do j=0 %to &n.;
if YYMM=&&m&j.. then do;
wealth_H=wealth&&m&j..;
Obligation_H=Obligation&&m&j..;
end;
%end;
%mend;

proc sort data=Transcation_tbl;by Cusomer_ID;Run;
data halvaot400d (drop=%dropp);
merge Transcation_tbl(in=a) %mmerge;
by Cusomer_ID;
%scorinit;
if a;
run;

Reeza · Posted 06-07-2021 11:45 AM

Neither IMO.

You should APPEND your data not join it in, ie use a UNION. If you have a column for every month, that means every month you're adding a new column to your DB not just extra rows which adds work to your process. And SQL and other languages can easily flip it. Also, you can then index your data by date making it easier to query as that's most likely the best way to filter your data in the long run. A wide format is used for reporting, but very, very rarely for data storage in my 20 years of experience.

Ronein · Posted 06-10-2021 07:23 AM

May you please also show code? thank you

Merge with mutiple of data sets or merge with one stacked data set?

Re: Merge with mutiple of data sets or merge with one stacked data set?

Re: Merge with mutiple of data sets or merge with one stacked data set?

Re: Merge with mutiple of data sets or merge with one stacked data set?

Re: Merge with mutiple of data sets or merge with one stacked data set?

Re: Merge with mutiple of data sets or merge with one stacked data set?

Re: Merge with mutiple of data sets or merge with one stacked data set?

Re: Merge with mutiple of data sets or merge with one stacked data set?

SAS Innovate 2025: Call for Content

Classroom Training Available!