Hi All,
I'm struck in coding below logic. Please help me out.
I have input data set of products and their usage across time. I need an output in such a way that every overlapping of product usage needs to be seen as seperate record. PFB example.
Input:
ID | beginning_date | end_date | product |
1 | 01.01.2016 | 31.01.2016 | A |
1 | 10.01.2016 | 29.01.2016 | B |
1 | 15.01.2016 | 31.01.2016 | C |
Output:
Output | |||
ID | beginning_date | end_date | Product |
1 | 01.01.2016 | 09.01.2016 | A |
1 | 10.01.2016 | 14.01.2016 | A+B |
1 | 15.01.2016 | 29.01.2016 | A+B+C |
1 | 30.01.2016 | 31.01.2016 | A+C |
Thanks alot.
Here is the complete code, from creating example data to final dataset:
data have;
input id beginning_date :ddmmyy10. end_date :ddmmyy10. product $;
format beginning_date end_date ddmmyyp10.;
cards;
1 01.01.2016 31.01.2016 A
1 10.01.2016 29.01.2016 B
1 15.01.2016 31.01.2016 C
;
run;
data int1;
set have;
do prod_date = beginning_date to end_date;
output;
end;
format prod_date ddmmyyp10.;
keep id prod_date product;
run;
proc sort data=int1;
by id prod_date product;
run;
data int2;
set int1;
by id prod_date;
retain products;
length products $10;
if first.prod_date then products = " ";
products = trim(products) !! product;
if last.prod_date then output;
keep id prod_date products;
run;
data want;
set int2;
by id products notsorted;
retain beginning_date;
format beginning_date end_date ddmmyyp10.;
if first.products then beginning_date = prod_date;
if last.products
then do;
end_date = prod_date;
output;
end;
keep id beginning_date end_date products;
run;
proc print data=want noobs;
run;
The result:
beginning_ id products date end_date 1 A 01.01.2016 09.01.2016 1 AB 10.01.2016 14.01.2016 1 ABC 15.01.2016 29.01.2016 1 AC 30.01.2016 31.01.2016
Switch to a record for each day and then recombine into intervals.
Hi,
Thanks for your reply.
I'm handling big data.
If I have to switch to one record for each day, will it be efficent?
Define "big data". How many billion records (because millions are not big)?
Keep in mind that, during calculation of the time intervals, you need to keep only the necessary variables, resulting in small observation sizes.
"big data" i mean to say not techinical term "big data".
let me be clear, i'm handling huge data, I have around million records.
Thnaks fro your reply.
A million records is puny when we're talking about SAS. So unless your server is twenty years old or severely misconfigured, no problem.
If you still have doubts, you need to be more specific. eg post a proc contents output of your dataset.
Hi,
I have swicthed to one record for each day. but I'm facing problem in recombining them into intervals.
Could you please give me some sample code.
Thanks.
Here is the complete code, from creating example data to final dataset:
data have;
input id beginning_date :ddmmyy10. end_date :ddmmyy10. product $;
format beginning_date end_date ddmmyyp10.;
cards;
1 01.01.2016 31.01.2016 A
1 10.01.2016 29.01.2016 B
1 15.01.2016 31.01.2016 C
;
run;
data int1;
set have;
do prod_date = beginning_date to end_date;
output;
end;
format prod_date ddmmyyp10.;
keep id prod_date product;
run;
proc sort data=int1;
by id prod_date product;
run;
data int2;
set int1;
by id prod_date;
retain products;
length products $10;
if first.prod_date then products = " ";
products = trim(products) !! product;
if last.prod_date then output;
keep id prod_date products;
run;
data want;
set int2;
by id products notsorted;
retain beginning_date;
format beginning_date end_date ddmmyyp10.;
if first.products then beginning_date = prod_date;
if last.products
then do;
end_date = prod_date;
output;
end;
keep id beginning_date end_date products;
run;
proc print data=want noobs;
run;
The result:
beginning_ id products date end_date 1 A 01.01.2016 09.01.2016 1 AB 10.01.2016 14.01.2016 1 ABC 15.01.2016 29.01.2016 1 AC 30.01.2016 31.01.2016
Hi Kurt,
Thanks alot.
This is very helpful.
If you have to handle really large datasets, some thoughts
- reduce the length of the date variables; a length of 4 is sufficient.
- always keep the minimum data necessary. That's why I did not include the + signs in the "products" string. Such things can be added at the end for display purposes. Make that collector string as short as necessary.
- if you need to keep longer character variables, use the compress=yes option to reduce dataset size; if such datasets need to be sorted, using the tagsort option might make a sort feasible that crashes otherwise.
SAS is usually I/O intensive, so tuning a SAS server/computer is mostly done on the storage side.
Of course,
I would not miss such kind of question.
data have;
input id beginning_date :ddmmyy10. end_date :ddmmyy10. product $;
format beginning_date end_date ddmmyyp10.;
cards;
1 01.01.2016 31.01.2016 A
1 10.01.2016 29.01.2016 B
1 15.01.2016 31.01.2016 C
;
run;
data temp(index=(product xx=(id date)));
set have;
do date=beginning_date to end_date;
output;
end;
drop beginning_date end_date;
run;
proc sql;
create table x as
select distinct product from have;
quit;
data _null_;
set x end=last;
if _n_=1 then call execute('data all;merge ');
call execute('temp(where=(p_'||put(_n_,8.-l)||'="'||product||'")
rename=(product=p_'||put(_n_,8.-l)||'))');
if last then call execute(';by id date;length f $ 200;f=catx("+",of p_:); run;');
run;
data want;
set all;
by id f notsorted;
retain first;
if first.f then first=date;
if last.f then do;begin_date=first;end_date=date;output; end;
format date Begin_date end_date date9.;
drop date p_: first;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.