Solved: Re: Base SAS tricky logic

Ajay2 · Posted 12-16-2016 06:14 AM

Hi All,

I'm struck in coding below logic. Please help me out.

I have input data set of products and their usage across time. I need an output in such a way that every overlapping of product usage needs to be seen as seperate record. PFB example.

Input:

ID	beginning_date	end_date	product
1	01.01.2016	31.01.2016	A
1	10.01.2016	29.01.2016	B
1	15.01.2016	31.01.2016	C

Output:

	Output
ID	beginning_date	end_date	Product
1	01.01.2016	09.01.2016	A
1	10.01.2016	14.01.2016	A+B
1	15.01.2016	29.01.2016	A+B+C
1	30.01.2016	31.01.2016	A+C

Thanks alot.

Kurt_Bremser · Posted 12-16-2016 08:14 AM

Here is the complete code, from creating example data to final dataset:

data have;
input id beginning_date :ddmmyy10. end_date :ddmmyy10. product $;
format beginning_date end_date ddmmyyp10.;
cards;
1 01.01.2016 31.01.2016 A
1 10.01.2016 29.01.2016 B
1 15.01.2016 31.01.2016 C
;
run;

data int1;
set have;
do prod_date = beginning_date to end_date;
  output;
end;
format prod_date ddmmyyp10.;
keep id prod_date product;
run;

proc sort data=int1;
by id prod_date product;
run;

data int2;
set int1;
by id prod_date;
retain products;
length products $10;
if first.prod_date then products = " ";
products = trim(products) !! product;
if last.prod_date then output;
keep id prod_date products;
run;

data want;
set int2;
by id products notsorted;
retain beginning_date;
format beginning_date end_date ddmmyyp10.;
if first.products then beginning_date = prod_date;
if last.products
then do;
  end_date = prod_date;
  output;
end;
keep id beginning_date end_date products;
run;

proc print data=want noobs;
run;

The result:

                  beginning_
id    products       date         end_date

 1      A         01.01.2016    09.01.2016
 1      AB        10.01.2016    14.01.2016
 1      ABC       15.01.2016    29.01.2016
 1      AC        30.01.2016    31.01.2016

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

View solution in original post

Reeza · Posted 12-16-2016 06:16 AM

Switch to a record for each day and then recombine into intervals.

Ajay2 · Posted 12-16-2016 07:10 AM

Hi,

Thanks for your reply.

I'm handling big data.

If I have to switch to one record for each day, will it be efficent?

Kurt_Bremser · Posted 12-16-2016 07:13 AM

Define "big data". How many billion records (because millions are not big)?

Keep in mind that, during calculation of the time intervals, you need to keep only the necessary variables, resulting in small observation sizes.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Ajay2 · Posted 12-16-2016 07:19 AM

"big data" i mean to say not techinical term "big data".

let me be clear, i'm handling huge data, I have around million records.

Thnaks fro your reply.

Kurt_Bremser · Posted 12-16-2016 07:23 AM

A million records is puny when we're talking about SAS. So unless your server is twenty years old or severely misconfigured, no problem.

If you still have doubts, you need to be more specific. eg post a proc contents output of your dataset.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Ajay2 · Posted 12-16-2016 07:57 AM

Hi,

I have swicthed to one record for each day. but I'm facing problem in recombining them into intervals.

Could you please give me some sample code.

Thanks.

Kurt_Bremser · Posted 12-16-2016 08:14 AM

Here is the complete code, from creating example data to final dataset:

data have;
input id beginning_date :ddmmyy10. end_date :ddmmyy10. product $;
format beginning_date end_date ddmmyyp10.;
cards;
1 01.01.2016 31.01.2016 A
1 10.01.2016 29.01.2016 B
1 15.01.2016 31.01.2016 C
;
run;

data int1;
set have;
do prod_date = beginning_date to end_date;
  output;
end;
format prod_date ddmmyyp10.;
keep id prod_date product;
run;

proc sort data=int1;
by id prod_date product;
run;

data int2;
set int1;
by id prod_date;
retain products;
length products $10;
if first.prod_date then products = " ";
products = trim(products) !! product;
if last.prod_date then output;
keep id prod_date products;
run;

data want;
set int2;
by id products notsorted;
retain beginning_date;
format beginning_date end_date ddmmyyp10.;
if first.products then beginning_date = prod_date;
if last.products
then do;
  end_date = prod_date;
  output;
end;
keep id beginning_date end_date products;
run;

proc print data=want noobs;
run;

The result:

                  beginning_
id    products       date         end_date

 1      A         01.01.2016    09.01.2016
 1      AB        10.01.2016    14.01.2016
 1      ABC       15.01.2016    29.01.2016
 1      AC        30.01.2016    31.01.2016

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Ajay2 · Posted 12-16-2016 08:38 AM

Hi Kurt,

Thanks alot.

This is very helpful.

Kurt_Bremser · Posted 12-16-2016 08:51 AM

If you have to handle really large datasets, some thoughts

- reduce the length of the date variables; a length of 4 is sufficient.

- always keep the minimum data necessary. That's why I did not include the + signs in the "products" string. Such things can be added at the end for display purposes. Make that collector string as short as necessary.

- if you need to keep longer character variables, use the compress=yes option to reduce dataset size; if such datasets need to be sorted, using the tagsort option might make a sort feasible that crashes otherwise.

SAS is usually I/O intensive, so tuning a SAS server/computer is mostly done on the storage side.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Ksharp · Posted 12-17-2016 02:43 AM

Of course,

I would not miss such kind of question.

data have;
input id beginning_date :ddmmyy10. end_date :ddmmyy10. product $;
format beginning_date end_date ddmmyyp10.;
cards;
1 01.01.2016 31.01.2016 A
1 10.01.2016 29.01.2016 B
1 15.01.2016 31.01.2016 C
;
run;
data temp(index=(product xx=(id date)));
 set have;
 do date=beginning_date to end_date;
  output;
 end;
 drop beginning_date end_date;
run;
proc sql;
create table x as
 select distinct product from have;
quit;
data _null_;
 set x end=last;
 if _n_=1 then call execute('data all;merge ');
 call execute('temp(where=(p_'||put(_n_,8.-l)||'="'||product||'") 
 rename=(product=p_'||put(_n_,8.-l)||'))');
 if last then call execute(';by id date;length f $ 200;f=catx("+",of p_:); run;');
run;

data want;
 set all;
 by id f notsorted;
 retain first;
 if first.f then first=date;
 if last.f then do;begin_date=first;end_date=date;output; end;
 format date Begin_date end_date date9.;
 drop date p_: first;
run;

SAS Innovate 2025: Save the Date

SAS Training: Just a Click Away