Hello,
I have a data set called Transcation_tbl with following fields: Cusomer_ID, date,YYMM,transaction_ID,Amount.
Date is date of transaction (SAS date).
YYMM is a char in structure YYMM
There are series of data sets called tYYMM with following columns: Cusomer_ID,wealth,Obligation.
The task is to add :wealth,Obligation to data set Transcation_tbl .
There are 2 ways to do it and I want to ask which way is better:
Way1:
Create one data set by stacking (union) data sets tYYMM and add a column called YYMM(Take it from the source data set name).
Then Perform merge Transcation_tbl + Stcked_tbl (Left Join) By Cusomer_ID+YYMM.
Please note that in this way the stacked data set will contain many rows(maybe 200 million rows)
Way2:
Merge Transcation_tbl +multiple data sets tYYMM (Left Join)
Which way is better?
Is there another way more efficient?
Will it save time If I define index for data sets?
Do we have any assumptions about Transcation_tbl? Is it sorted (if yes, how?), is it indexed? Is, and if yes how, the YYMM related to date?
How many datasets tYYMM do you have?
Bart
Give us a few usable examples for the TYYMM and transaction_tbl datasets, covering all constellations appearing in your real data, and the result you expect from that.
Hello @Ronein
1.If your data is in RBMS tables, as much as possible use the native SQL, So SQL Pass through an native SQL syntax.
2.The first approach looks more comprehensible an I would prefer it.
3.Both approaches come with a cost, but the first one will be easy to manage.
Hello all, please see current code.
In real life there are many fields to tYYMM data sets .
In real life there are many data sets tYYMM(50 data sets that are produced every month).
Question:
May anyone show other attitudes that are more efficient and explain why they are more efficient?
Data Transcation_tbl;
format date date9.;
Input Cusomer_ID date :date9. YYMM $ transaction_ID Amount;
cards;
1 27Apr2021 2103 123456 1000
2 22Mar2021 2101 345677 2000
3 01Jan2021 2012 763372 3000
;
Run;
Data t2012;
input Cusomer_ID wealth Obligation;
cards;
1 100 10
2 200 20
3 100 0
4 400 30
5 200 20
6 500 50
;
run;
Data t2101;
input Cusomer_ID wealth Obligation;
cards;
1 200 20
2 100 30
3 300 10
4 200 0
5 500 0
6 400 25
;
run;
Data t2102;
input Cusomer_ID wealth Obligation;
cards;
1 300 10
3 200 20
6 300 30
;
run;
Data t2103;
input Cusomer_ID wealth Obligation;
cards;
1 200 40
3 200 20
4 500 50
6 300 30
;
run;
PROC SQL noprint;
select min(YYMM),max(YYMM) into: minYYMM , :maxYYMM
from Transcation_tbl
;
QUIT;
%put &minYYMM;
%put &maxYYMM;
data _null_;
date_start=mdy(mod(&minYYMM,100),1,floor(&minYYMM/100));
date_end=mdy(mod(&maxYYMM,100),1,floor(&maxYYMM/100));
format date_start date_end date9.;
counter = intck('month',date_start,date_end);
call symputx('n',put(counter,best.));
call symputx('date_start',put(date_start,best.));
run;
%put &n.;
%macro months;
%do i=0 %to &n.;
m&i.=put(intnx('month',&date_start.,&i.),yymmn4.);
call symputx("m&i",trim(left(m&i.)));
%end;
%mend;
data _null_;
%months;
run;
%put &n;
%put &m0;
%PUT &&m&n..;
%macro ssort;
%do j=0 %to &n.;
proc sort data=t&&m&j..;by Cusomer_ID;Run;
%end;
%mend ssort;
%ssort;
%macro mmerge;
%do j=0 %to &n.;
t&&m&j..(rename=(wealth=wealth&&m&j.. Obligation=Obligation&&m&j..))
%end;
%mend mmerge;
%put %mmerge;
%macro dropp;
%do j=0 %to &n.;
wealth&&m&j..
Obligation&&m&j..
%end;
%mend;
%macro scorinit;
%do j=0 %to &n.;
if YYMM=&&m&j.. then do;
wealth_H=wealth&&m&j..;
Obligation_H=Obligation&&m&j..;
end;
%end;
%mend;
proc sort data=Transcation_tbl;by Cusomer_ID;Run;
data halvaot400d (drop=%dropp);
merge Transcation_tbl(in=a) %mmerge;
by Cusomer_ID;
%scorinit;
if a;
run;
Neither IMO.
You should APPEND your data not join it in, ie use a UNION. If you have a column for every month, that means every month you're adding a new column to your DB not just extra rows which adds work to your process. And SQL and other languages can easily flip it. Also, you can then index your data by date making it easier to query as that's most likely the best way to filter your data in the long run. A wide format is used for reporting, but very, very rarely for data storage in my 20 years of experience.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.