BookmarkSubscribeRSS Feed
tucson77
Calcite | Level 5

Dear SAS experts,

 

I need your help with my problem.

 

I've got a dataset which has an ID attribute (unique identifier) and a date attribute (different dates going from 01/Feb to 28/Feb). I also have 28 datasets (representing each day of Feb 2018) which has the ID (same as the first one), a date (for each day of Feb) and Balances for that day.

Now I would like to update my first dataset with the balances for each day. Example if the date is 01/Feb/2018 then look up the balance for that day in the 01/Feb/2018 dataset and so on.

 

I can do it manually but this will take ages so would like to macrofy this process if possible.

 

Files are attached. Date1 is the first dataset (which only contains 'ID' and 'Date') and Balance1, Balance2 etc are datasets generated for each day of Feb with the balances).

 

Thanks in advance for your help.

 

Tucson77

10 REPLIES 10
Cynthia_sas
SAS Super FREQ
Hi:
I'm not sure you need a macro, it sounds like a merge or join process to me.
Cynthia
tucson77
Calcite | Level 5

Hi Cynthia,

 

Thanks for your quick response ... I always enjoy your responses on proc tabulate, proc report etc (at my workplace I have adapted and used some of your codes 🙂

 

Not sure how I can look up the balances for each day so I thought of a macro or possibly a do loop.

 

Regards,

Tucson

Reeza
Super User

Make it easy for us to help you by providing your data as data steps and clearly illustrating your desired output. You have 4 files below, but haven't indicated what's in them either so I could make a guess but its easier if you tell us. Instructions on how to provide data is here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat...

 

Depending on the size of the data (important in this case) a macro is likely not the best option here. 

 

I'd probably say you should create a view of the data for the ID stacked together and then use a JOIN to get all the information at once. How well that works depends on the size of your data. If the data is large, instead of a join, I'd probably say try a HASH table first. 

 

Are your dates, ddmmyy or mmddyy? Do you have a naming conventions for your data sets as well? I'm assuming it's not balance1/balance2/balance3? That may needed to factor in a solution. Are your dates SAS dates (numeric with a date format or character? 

 

 

tucson77
Calcite | Level 5

Hi Reeza,

 

Glad to have another great contributor looking at my query!

 

I work for a big organization and unfortunately datasets are massive. Basically what i would like is a way of updating my 'Date1' dataset with the balance from the balance datasets. Say for all customers on 01st Feb 2018 I'd like to add the balances they had on that day and this can be found in the balance datasets. Dates are ddmmyy and you are correct it is not balanes1,2 etc but rather balance_01feb2018 etc.

 

Apologies if this is confusing but I work with very confidential data.

 

Thanks

Reeza
Super User

@tucson77 wrote:

Hi Reeza,

  

Apologies if this is confusing but I work with very confidential data.

 

Thanks


Make fake data that reflects the real data. What's massive? Are we talking about 1 million or 1 billion? For even several million, I'd probably still join the tables. 

 

And do you have indexes on these tables? If so, on what fields, ID and Date or just ID?

Patrick
Opal | Level 21

@tucson77

Something along the line of below should give you what you're asking for.


/* sort date data set */
proc sort data=data1 out=data1(sortedby=id date);
  by id date;
run;

/* create list of required and available daily balance data sets */
%let dslist=;
proc sql noprint;
  select cats('balance_',put(date,date9.)) into :dslist separated by ' '
  from
    (
      select distinct(date)
      from data1 as d inner join dictionary.tables as t
      on 
        not missing(d.date)
        and t.libname='WORK' and t.memname like 'BALANCE^_%' escape '^'
        and d.date=input(scan(t.memname,-1,'_'),? date9.)
    )
  ;
quit;


/* combine all required balance data sets into a view as preparation for sorting */
data _all_balances /view=_all_balances;
  set &dslist:;
run;
/* sort combined balance data sets */
proc sort data=_all_balances out=all_balances;
  by id date;
run;

/* merge date ds with balance ds */
data want;
  merge data1(in=indata) all_balances;
  by id date;
  if indata;
run;

 

Depending on volumes the sorting of data will eventually require quite a bit of time. If performance is going to be a problem and you need performance optimized code then you will need to give us quite a bit more detail how your data really looks like (what you have and what you need).

tucson77
Calcite | Level 5

Hi Patrick,

 

Thanks for taking your time to look into this.

 

Will use your code and keep you updated how it goes.

 

I'll keep an eye on performance especially when sorting the big dataset.

 

Regards,

 

Tucson77

Astounding
PROC Star

A little more explanation is needed here.  Why do you need the first data set at all?  Why can't you combine the 28 balance data sets to get what you need?  

 

SuryaKiran
Meteorite | Level 14

Check these two approaches;

/* No Sorting is needed */
PROC SQL;
CREATE TABLE WANT AS
SELECT A.*,CASE WHEN B.BALANCE IS NOT NULL THEN B.BALANCE
				WHEN C.BALANCE IS NOT NULL THEN C.BALANCE
				WHEN D.BALANCE IS NOT NULL THEN D.BALANCE
				END AS BALANCE_NEW FORMAT=8.
	FROM DATA1 AS A
	LEFT JOIN BALANCE1 AS B ON ( A.ID=B.ID AND A.DATE=B.DATE )
	LEFT JOIN BALANCE2 AS C ON ( A.ID=C.ID AND A.DATE=C.DATE )
	LEFT JOIN BALANCE3 AS D ON ( A.ID=D.ID AND A.DATE=D.DATE)
	;
QUIT;
/* Need to sort the data before merging */
PROC SORT DATA=DATA1;
BY ID DATE;
PROC SORT DATA=BALANCE1;
BY ID DATE;
PROC SORT DATA=BALANCE2;
BY ID DATE;
PROC SORT DATA=BALANCE3;
BY ID DATE;
DATA WANT;
update DATA1 BALANCE1-BALANCE3;
BY ID DATE;
RUN;
Thanks,
Suryakiran
tucson77
Calcite | Level 5

Hi SuryaKiran,

 

Many thx for your response.

 

Will use your code and let you know how it goes.

 

Cheers,

 

Tcs77

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 10 replies
  • 1166 views
  • 2 likes
  • 6 in conversation