BookmarkSubscribeRSS Feed
Ronein
Meteorite | Level 14

Hello,

I have a data set called Transcation_tbl with following fields: Cusomer_ID, date,YYMM,transaction_ID,Amount.

Date is date of transaction (SAS date).

YYMM is a char in structure YYMM 

There are series of data sets called tYYMM with following columns: Cusomer_ID,wealth,Obligation.

The task is to add :wealth,Obligation to data set  Transcation_tbl .

There are 2 ways to do it and I want to ask which way is better:

Way1:

Create one data set by stacking (union) data sets tYYMM and add a column called YYMM(Take it from the source data set name).

Then Perform merge  Transcation_tbl + Stcked_tbl (Left Join) By Cusomer_ID+YYMM.

Please note that in this way the stacked data set will contain many rows(maybe 200 million rows)

Way2:

Merge Transcation_tbl +multiple data sets  tYYMM (Left Join)

 

Which way is better?

Is there another way more efficient?

Will it save time If I define index for data sets?

 

 

7 REPLIES 7
yabwon
Onyx | Level 15

Do we have any assumptions about Transcation_tbl? Is it sorted (if yes, how?), is it indexed? Is, and if yes how, the YYMM related to date?

How many datasets tYYMM do you have?

 

Bart

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation



Ronein
Meteorite | Level 14

Is it sorted (if yes, how?), is it indexed?Yes, It is sorted, No, It has no Index

Is, and if yes how, the YYMM related to date?
How many datasets tYYMM do you have? In real there are 24 data sets of tYYM (each data set contain around 5 million rows)

Bart

Kurt_Bremser
Super User

Give us a few usable examples for the TYYMM and transaction_tbl datasets, covering all constellations appearing in your real data, and the result you expect from that.

Sajid01
Meteorite | Level 14

Hello @Ronein 
1.If your data is in  RBMS tables, as much as possible use the native SQL, So SQL Pass through an native SQL syntax.

2.The first approach looks more comprehensible an I would prefer it. 
3.Both approaches come with a cost, but the first one will be easy to manage.

Ronein
Meteorite | Level 14

Hello all, please see current code.

In real life there are many fields to tYYMM data sets .

In real life there are many data sets tYYMM(50  data sets that are produced every month).

Question: 

May anyone show other attitudes that are more efficient and explain why they are more efficient?

 

 

 


Data Transcation_tbl;
format date date9.;
Input Cusomer_ID  date :date9. YYMM $  transaction_ID  Amount;
cards;
1 27Apr2021 2103 123456 1000
2 22Mar2021 2101 345677 2000
3 01Jan2021 2012 763372 3000
;
Run;

Data t2012;
input Cusomer_ID wealth Obligation;
cards;
1 100 10
2 200 20
3 100 0
4 400 30
5 200 20
6 500 50
;
run;

Data t2101;
input Cusomer_ID wealth Obligation;
cards;
1 200 20
2 100 30
3 300 10
4 200 0
5 500 0
6 400 25
;
run;


Data t2102;
input Cusomer_ID wealth Obligation;
cards;
1 300 10
3 200 20
6 300 30
;
run;



Data t2103;
input Cusomer_ID wealth Obligation;
cards;
1 200 40
3 200 20
4 500 50
6 300 30
;
run;

PROC SQL noprint;
	select  min(YYMM),max(YYMM) into: minYYMM , :maxYYMM	   
	from  Transcation_tbl
;
QUIT;
%put &minYYMM;
%put &maxYYMM;


data _null_;
  date_start=mdy(mod(&minYYMM,100),1,floor(&minYYMM/100));
  date_end=mdy(mod(&maxYYMM,100),1,floor(&maxYYMM/100));
  format date_start date_end  date9.;
  counter = intck('month',date_start,date_end);
  call symputx('n',put(counter,best.));
  call symputx('date_start',put(date_start,best.));
run;
%put &n.;

%macro months;
%do i=0 %to &n.;
     m&i.=put(intnx('month',&date_start.,&i.),yymmn4.);
     call symputx("m&i",trim(left(m&i.)));
%end;
%mend;

data _null_;
%months;
run;

%put &n;
%put &m0;
%PUT &&m&n..;



%macro ssort;
%do j=0 %to &n.;
proc sort data=t&&m&j..;by Cusomer_ID;Run;
%end;	
%mend ssort;
%ssort;


%macro mmerge;
%do j=0 %to &n.;
t&&m&j..(rename=(wealth=wealth&&m&j.. Obligation=Obligation&&m&j..))
%end;
%mend mmerge;
%put %mmerge;


%macro dropp;
%do j=0 %to &n.;
wealth&&m&j.. 
Obligation&&m&j..
%end;	
%mend;


%macro scorinit;
%do j=0 %to &n.;
if YYMM=&&m&j.. then do;
wealth_H=wealth&&m&j..;
Obligation_H=Obligation&&m&j..;
end;
%end;
%mend;

proc sort data=Transcation_tbl;by Cusomer_ID;Run;
data halvaot400d (drop=%dropp);
merge Transcation_tbl(in=a) %mmerge;
by Cusomer_ID;
%scorinit;
if a;
run; 
Reeza
Super User

Neither IMO.

 

You should APPEND your data not join it in, ie use a UNION. If you have a column for every month, that means every month you're adding a new column to your DB not just extra rows which adds work to your process. And SQL and other languages can easily flip it. Also, you can then index your data by date making it easier to query as that's most likely the best way to filter your data in the long run. A wide format is used for reporting, but very, very rarely for data storage in my 20 years of experience. 

Ronein
Meteorite | Level 14
May you please also show code? thank you

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 539 views
  • 0 likes
  • 5 in conversation