BookmarkSubscribeRSS Feed
Srigyan
Quartz | Level 8

Hi,

I have the following data

 

Start_DateEnd_DatePrice
01-Jan-1805-Jan-1815
03-Jan-1809-Jan-1814
07-Jan-1810-Jan-1812
02-Jan-1807-Jan-1818

 

& I am looking for following output

 

Start_DateEnd_DatePrice
01-Jan-1802-Jan-1815
03-Jan-1806-Jan-1814
07-Jan-1810-Jan-1812

 

How I find the price is based on what is the lowest price available in the first table.

1) 15 is lowest price is for 01-Jan & 2nd Jan.

2) In the same way, 14 is lowest price for 03rdJan to 06-Jan

3) 18 -is never the lowest price so it didn't come in the output data. 

***Our output data start date is 01-Jan-2018 and end date is 10-Jan-2018.

 

This is a very huge data originally so I need the most optimized SAS program which completes this job

4 REPLIES 4
Astounding
PROC Star

Here's one approach:

 

data all;

set have;

do date = start_date to end_date;

   output;

end;

keep date price;

run;

 

proc summary data=all nway;

var price;

class date;

output out=minimums (keep=price date) min=;

run;

  

data want;

set minimums;

by price notsorted;

retain start_date;

if first.price then start_date = date;

if last.price;

end_date = date;

drop date;

run;

 

It's untested code, but looks right.

Srigyan
Quartz | Level 8

This is working great, however As I mentioned this is for huge database, the following program

 

data all;
format date date9.;
set sample;
do date=Start_Date to End_Date;
output;
end;
keep date price;
run;

 

creates one record for each day and each price.I need to calculate this 5 years for more than a billion records. 

Just don't want to create one record for each date, and each price. Can we optimize this process?

To approve a single suggestion, mouse over it and click "✔"
Click the bubble to approve all of its suggestions.
Astounding
PROC Star

 Sorry, I didn't appreciate the size of the data so I tried to use a clearer approach.  Here's another way to get summarized data.  I'm going to assume you know the minimum and maximum dates and can plug them into this step, but if that needs to be automated it can be:

 

data minimums;

set have;

array prices ("01jan2010"d : "31dec2018"d);

do until (done);

   set have end=done;

   do k=start_date to end_date;

      prices{k} = min(prices{k}, price);

   end;

end;

do date="01jan2010"d to "31dec2018"d;

   price = prices{k};

   output;

end;

keep date price;

run;

 

This might still take a while to process 5B records.  But direct access to an array element should even be faster than hashing.

 

Then run the final DATA step as is (by price notsorted).

Ksharp
Super User

Hope your computer have a big memory .

 

 

data have;
infile cards expandtabs truncover;
input Start_Date :date9. End_Date :date9. Price;
format  Start_Date End_Date :date9.;
cards;
01-Jan-18	05-Jan-18	15
03-Jan-18	09-Jan-18	14
07-Jan-18	10-Jan-18	12
02-Jan-18	07-Jan-18	18
;
run;
data _null_;
 if _n_=1 then do;
   declare hash h(hashexp:20,ordered:'y');
   h.definekey('date');
   h.definedata('date','p');
   h.definedone();
 end;
set have end=last;
do date=Start_Date to End_Date ;
  if h.find()=0 then do;p=min(p,price);h.replace(); end;
   else do;p=price;h.add(); end;
end;
if last then h.output(dataset:'want');
format date date9.;
run;

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 889 views
  • 0 likes
  • 3 in conversation