Hi,
I have the following data
Start_Date | End_Date | Price |
01-Jan-18 | 05-Jan-18 | 15 |
03-Jan-18 | 09-Jan-18 | 14 |
07-Jan-18 | 10-Jan-18 | 12 |
02-Jan-18 | 07-Jan-18 | 18 |
& I am looking for following output
Start_Date | End_Date | Price |
01-Jan-18 | 02-Jan-18 | 15 |
03-Jan-18 | 06-Jan-18 | 14 |
07-Jan-18 | 10-Jan-18 | 12 |
How I find the price is based on what is the lowest price available in the first table.
1) 15 is lowest price is for 01-Jan & 2nd Jan.
2) In the same way, 14 is lowest price for 03rdJan to 06-Jan
3) 18 -is never the lowest price so it didn't come in the output data.
***Our output data start date is 01-Jan-2018 and end date is 10-Jan-2018.
This is a very huge data originally so I need the most optimized SAS program which completes this job
Here's one approach:
data all;
set have;
do date = start_date to end_date;
output;
end;
keep date price;
run;
proc summary data=all nway;
var price;
class date;
output out=minimums (keep=price date) min=;
run;
data want;
set minimums;
by price notsorted;
retain start_date;
if first.price then start_date = date;
if last.price;
end_date = date;
drop date;
run;
It's untested code, but looks right.
This is working great, however As I mentioned this is for huge database, the following program
data all;
format date date9.;
set sample;
do date=Start_Date to End_Date;
output;
end;
keep date price;
run;
creates one record for each day and each price.I need to calculate this 5 years for more than a billion records.
Just don't want to create one record for each date, and each price. Can we optimize this process?
Sorry, I didn't appreciate the size of the data so I tried to use a clearer approach. Here's another way to get summarized data. I'm going to assume you know the minimum and maximum dates and can plug them into this step, but if that needs to be automated it can be:
data minimums;
set have;
array prices ("01jan2010"d : "31dec2018"d);
do until (done);
set have end=done;
do k=start_date to end_date;
prices{k} = min(prices{k}, price);
end;
end;
do date="01jan2010"d to "31dec2018"d;
price = prices{k};
output;
end;
keep date price;
run;
This might still take a while to process 5B records. But direct access to an array element should even be faster than hashing.
Then run the final DATA step as is (by price notsorted).
Hope your computer have a big memory .
data have;
infile cards expandtabs truncover;
input Start_Date :date9. End_Date :date9. Price;
format Start_Date End_Date :date9.;
cards;
01-Jan-18 05-Jan-18 15
03-Jan-18 09-Jan-18 14
07-Jan-18 10-Jan-18 12
02-Jan-18 07-Jan-18 18
;
run;
data _null_;
if _n_=1 then do;
declare hash h(hashexp:20,ordered:'y');
h.definekey('date');
h.definedata('date','p');
h.definedone();
end;
set have end=last;
do date=Start_Date to End_Date ;
if h.find()=0 then do;p=min(p,price);h.replace(); end;
else do;p=price;h.add(); end;
end;
if last then h.output(dataset:'want');
format date date9.;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.