BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Ajay2
Calcite | Level 5

Hi All,

 

I'm struck in coding below logic. Please help me out.

 

I have input data set of products and their usage across time. I need an output  in such a way that every overlapping of product   usage needs to be seen as seperate record. PFB example.

 

Input:

 

IDbeginning_dateend_dateproduct
101.01.201631.01.2016A
110.01.201629.01.2016B
115.01.201631.01.2016C

 

Output:

 

 Output  
IDbeginning_dateend_dateProduct
101.01.201609.01.2016A
110.01.201614.01.2016A+B
115.01.201629.01.2016A+B+C
130.01.201631.01.2016A+C

 

Thanks alot.

1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

Here is the complete code, from creating example data to final dataset:

data have;
input id beginning_date :ddmmyy10. end_date :ddmmyy10. product $;
format beginning_date end_date ddmmyyp10.;
cards;
1 01.01.2016 31.01.2016 A
1 10.01.2016 29.01.2016 B
1 15.01.2016 31.01.2016 C
;
run;

data int1;
set have;
do prod_date = beginning_date to end_date;
  output;
end;
format prod_date ddmmyyp10.;
keep id prod_date product;
run;

proc sort data=int1;
by id prod_date product;
run;

data int2;
set int1;
by id prod_date;
retain products;
length products $10;
if first.prod_date then products = " ";
products = trim(products) !! product;
if last.prod_date then output;
keep id prod_date products;
run;

data want;
set int2;
by id products notsorted;
retain beginning_date;
format beginning_date end_date ddmmyyp10.;
if first.products then beginning_date = prod_date;
if last.products
then do;
  end_date = prod_date;
  output;
end;
keep id beginning_date end_date products;
run;

proc print data=want noobs;
run;

The result:

                  beginning_
id    products       date         end_date

 1      A         01.01.2016    09.01.2016
 1      AB        10.01.2016    14.01.2016
 1      ABC       15.01.2016    29.01.2016
 1      AC        30.01.2016    31.01.2016

View solution in original post

10 REPLIES 10
Reeza
Super User

Switch to a record for each day and then recombine into intervals. 

Ajay2
Calcite | Level 5

Hi,

 

Thanks for your reply.

 

I'm handling big data.

 

If I have to switch to one record for each day, will it be efficent?

Kurt_Bremser
Super User

Define "big data". How many billion records (because millions are not big)?

Keep in mind that, during calculation of the time intervals, you need to keep only the necessary variables, resulting in small observation sizes.

Ajay2
Calcite | Level 5

"big data" i mean to say not techinical term "big data".

 

let me be clear, i'm handling huge data, I have around million records.

 

Thnaks fro your reply.

Kurt_Bremser
Super User

A million records is puny when we're talking about SAS. So unless your server is twenty years old or severely misconfigured, no problem.

 

If you still have doubts, you need to be more specific. eg post a proc contents output of your dataset.

 

Ajay2
Calcite | Level 5

Hi,

 

I have swicthed to one record for each day. but I'm facing problem  in recombining them into intervals.

 

Could you please give me some sample code.

 

Thanks.

Kurt_Bremser
Super User

Here is the complete code, from creating example data to final dataset:

data have;
input id beginning_date :ddmmyy10. end_date :ddmmyy10. product $;
format beginning_date end_date ddmmyyp10.;
cards;
1 01.01.2016 31.01.2016 A
1 10.01.2016 29.01.2016 B
1 15.01.2016 31.01.2016 C
;
run;

data int1;
set have;
do prod_date = beginning_date to end_date;
  output;
end;
format prod_date ddmmyyp10.;
keep id prod_date product;
run;

proc sort data=int1;
by id prod_date product;
run;

data int2;
set int1;
by id prod_date;
retain products;
length products $10;
if first.prod_date then products = " ";
products = trim(products) !! product;
if last.prod_date then output;
keep id prod_date products;
run;

data want;
set int2;
by id products notsorted;
retain beginning_date;
format beginning_date end_date ddmmyyp10.;
if first.products then beginning_date = prod_date;
if last.products
then do;
  end_date = prod_date;
  output;
end;
keep id beginning_date end_date products;
run;

proc print data=want noobs;
run;

The result:

                  beginning_
id    products       date         end_date

 1      A         01.01.2016    09.01.2016
 1      AB        10.01.2016    14.01.2016
 1      ABC       15.01.2016    29.01.2016
 1      AC        30.01.2016    31.01.2016
Ajay2
Calcite | Level 5

Hi Kurt,

 

Thanks alot.

 

This is very helpful.

Kurt_Bremser
Super User

If you have to handle really large datasets, some thoughts

- reduce the length of the date variables; a length of 4 is sufficient.

- always keep the minimum data necessary. That's why I did not include the + signs in the "products" string. Such things can be added at the end for display purposes. Make that collector string as short as necessary.

- if you need to keep longer character variables, use the compress=yes option to reduce dataset size; if such datasets need to be sorted, using the tagsort option might make a sort feasible that crashes otherwise.

 

SAS is usually I/O intensive, so tuning a SAS server/computer is mostly done on the storage side.

Ksharp
Super User

Of course,

I would not miss such kind of question.

 

 

 

data have;
input id beginning_date :ddmmyy10. end_date :ddmmyy10. product $;
format beginning_date end_date ddmmyyp10.;
cards;
1 01.01.2016 31.01.2016 A
1 10.01.2016 29.01.2016 B
1 15.01.2016 31.01.2016 C
;
run;
data temp(index=(product xx=(id date)));
 set have;
 do date=beginning_date to end_date;
  output;
 end;
 drop beginning_date end_date;
run;
proc sql;
create table x as
 select distinct product from have;
quit;
data _null_;
 set x end=last;
 if _n_=1 then call execute('data all;merge ');
 call execute('temp(where=(p_'||put(_n_,8.-l)||'="'||product||'") 
 rename=(product=p_'||put(_n_,8.-l)||'))');
 if last then call execute(';by id date;length f $ 200;f=catx("+",of p_:); run;');
run;

data want;
 set all;
 by id f notsorted;
 retain first;
 if first.f then first=date;
 if last.f then do;begin_date=first;end_date=date;output; end;
 format date Begin_date end_date date9.;
 drop date p_: first;
run;

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 10 replies
  • 2058 views
  • 0 likes
  • 4 in conversation