Dear SAS experts,
I need your help with my problem.
I've got a dataset which has an ID attribute (unique identifier) and a date attribute (different dates going from 01/Feb to 28/Feb). I also have 28 datasets (representing each day of Feb 2018) which has the ID (same as the first one), a date (for each day of Feb) and Balances for that day.
Now I would like to update my first dataset with the balances for each day. Example if the date is 01/Feb/2018 then look up the balance for that day in the 01/Feb/2018 dataset and so on.
I can do it manually but this will take ages so would like to macrofy this process if possible.
Files are attached. Date1 is the first dataset (which only contains 'ID' and 'Date') and Balance1, Balance2 etc are datasets generated for each day of Feb with the balances).
Thanks in advance for your help.
Tucson77
Hi Cynthia,
Thanks for your quick response ... I always enjoy your responses on proc tabulate, proc report etc (at my workplace I have adapted and used some of your codes 🙂
Not sure how I can look up the balances for each day so I thought of a macro or possibly a do loop.
Regards,
Tucson
Make it easy for us to help you by providing your data as data steps and clearly illustrating your desired output. You have 4 files below, but haven't indicated what's in them either so I could make a guess but its easier if you tell us. Instructions on how to provide data is here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat...
Depending on the size of the data (important in this case) a macro is likely not the best option here.
I'd probably say you should create a view of the data for the ID stacked together and then use a JOIN to get all the information at once. How well that works depends on the size of your data. If the data is large, instead of a join, I'd probably say try a HASH table first.
Are your dates, ddmmyy or mmddyy? Do you have a naming conventions for your data sets as well? I'm assuming it's not balance1/balance2/balance3? That may needed to factor in a solution. Are your dates SAS dates (numeric with a date format or character?
Hi Reeza,
Glad to have another great contributor looking at my query!
I work for a big organization and unfortunately datasets are massive. Basically what i would like is a way of updating my 'Date1' dataset with the balance from the balance datasets. Say for all customers on 01st Feb 2018 I'd like to add the balances they had on that day and this can be found in the balance datasets. Dates are ddmmyy and you are correct it is not balanes1,2 etc but rather balance_01feb2018 etc.
Apologies if this is confusing but I work with very confidential data.
Thanks
@tucson77 wrote:
Hi Reeza,
Apologies if this is confusing but I work with very confidential data.
Thanks
Make fake data that reflects the real data. What's massive? Are we talking about 1 million or 1 billion? For even several million, I'd probably still join the tables.
And do you have indexes on these tables? If so, on what fields, ID and Date or just ID?
Something along the line of below should give you what you're asking for.
/* sort date data set */
proc sort data=data1 out=data1(sortedby=id date);
by id date;
run;
/* create list of required and available daily balance data sets */
%let dslist=;
proc sql noprint;
select cats('balance_',put(date,date9.)) into :dslist separated by ' '
from
(
select distinct(date)
from data1 as d inner join dictionary.tables as t
on
not missing(d.date)
and t.libname='WORK' and t.memname like 'BALANCE^_%' escape '^'
and d.date=input(scan(t.memname,-1,'_'),? date9.)
)
;
quit;
/* combine all required balance data sets into a view as preparation for sorting */
data _all_balances /view=_all_balances;
set &dslist:;
run;
/* sort combined balance data sets */
proc sort data=_all_balances out=all_balances;
by id date;
run;
/* merge date ds with balance ds */
data want;
merge data1(in=indata) all_balances;
by id date;
if indata;
run;
Depending on volumes the sorting of data will eventually require quite a bit of time. If performance is going to be a problem and you need performance optimized code then you will need to give us quite a bit more detail how your data really looks like (what you have and what you need).
Hi Patrick,
Thanks for taking your time to look into this.
Will use your code and keep you updated how it goes.
I'll keep an eye on performance especially when sorting the big dataset.
Regards,
Tucson77
A little more explanation is needed here. Why do you need the first data set at all? Why can't you combine the 28 balance data sets to get what you need?
Check these two approaches;
/* No Sorting is needed */
PROC SQL;
CREATE TABLE WANT AS
SELECT A.*,CASE WHEN B.BALANCE IS NOT NULL THEN B.BALANCE
WHEN C.BALANCE IS NOT NULL THEN C.BALANCE
WHEN D.BALANCE IS NOT NULL THEN D.BALANCE
END AS BALANCE_NEW FORMAT=8.
FROM DATA1 AS A
LEFT JOIN BALANCE1 AS B ON ( A.ID=B.ID AND A.DATE=B.DATE )
LEFT JOIN BALANCE2 AS C ON ( A.ID=C.ID AND A.DATE=C.DATE )
LEFT JOIN BALANCE3 AS D ON ( A.ID=D.ID AND A.DATE=D.DATE)
;
QUIT;
/* Need to sort the data before merging */
PROC SORT DATA=DATA1;
BY ID DATE;
PROC SORT DATA=BALANCE1;
BY ID DATE;
PROC SORT DATA=BALANCE2;
BY ID DATE;
PROC SORT DATA=BALANCE3;
BY ID DATE;
DATA WANT;
update DATA1 BALANCE1-BALANCE3;
BY ID DATE;
RUN;
Hi SuryaKiran,
Many thx for your response.
Will use your code and let you know how it goes.
Cheers,
Tcs77
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.