DATA Step, Macro, Functions and more

Base SAS tricky logic

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 5
Accepted Solution

Base SAS tricky logic

Hi All,

 

I'm struck in coding below logic. Please help me out.

 

I have input data set of products and their usage across time. I need an output  in such a way that every overlapping of product   usage needs to be seen as seperate record. PFB example.

 

Input:

 

IDbeginning_dateend_dateproduct
101.01.201631.01.2016A
110.01.201629.01.2016B
115.01.201631.01.2016C

 

Output:

 

 Output  
IDbeginning_dateend_dateProduct
101.01.201609.01.2016A
110.01.201614.01.2016A+B
115.01.201629.01.2016A+B+C
130.01.201631.01.2016A+C

 

Thanks alot.


Accepted Solutions
Solution
‎12-16-2016 08:37 AM
Super User
Posts: 6,982

Re: Base SAS tricky logic

Here is the complete code, from creating example data to final dataset:

data have;
input id beginning_date :ddmmyy10. end_date :ddmmyy10. product $;
format beginning_date end_date ddmmyyp10.;
cards;
1 01.01.2016 31.01.2016 A
1 10.01.2016 29.01.2016 B
1 15.01.2016 31.01.2016 C
;
run;

data int1;
set have;
do prod_date = beginning_date to end_date;
  output;
end;
format prod_date ddmmyyp10.;
keep id prod_date product;
run;

proc sort data=int1;
by id prod_date product;
run;

data int2;
set int1;
by id prod_date;
retain products;
length products $10;
if first.prod_date then products = " ";
products = trim(products) !! product;
if last.prod_date then output;
keep id prod_date products;
run;

data want;
set int2;
by id products notsorted;
retain beginning_date;
format beginning_date end_date ddmmyyp10.;
if first.products then beginning_date = prod_date;
if last.products
then do;
  end_date = prod_date;
  output;
end;
keep id beginning_date end_date products;
run;

proc print data=want noobs;
run;

The result:

                  beginning_
id    products       date         end_date

 1      A         01.01.2016    09.01.2016
 1      AB        10.01.2016    14.01.2016
 1      ABC       15.01.2016    29.01.2016
 1      AC        30.01.2016    31.01.2016
---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers

View solution in original post


All Replies
Super User
Posts: 17,963

Re: Base SAS tricky logic

Switch to a record for each day and then recombine into intervals. 

Occasional Contributor
Posts: 5

Re: Base SAS tricky logic

Hi,

 

Thanks for your reply.

 

I'm handling big data.

 

If I have to switch to one record for each day, will it be efficent?

Super User
Posts: 6,982

Re: Base SAS tricky logic

Define "big data". How many billion records (because millions are not big)?

Keep in mind that, during calculation of the time intervals, you need to keep only the necessary variables, resulting in small observation sizes.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Occasional Contributor
Posts: 5

Re: Base SAS tricky logic

"big data" i mean to say not techinical term "big data".

 

let me be clear, i'm handling huge data, I have around million records.

 

Thnaks fro your reply.

Super User
Posts: 6,982

Re: Base SAS tricky logic

A million records is puny when we're talking about SAS. So unless your server is twenty years old or severely misconfigured, no problem.

 

If you still have doubts, you need to be more specific. eg post a proc contents output of your dataset.

 

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Occasional Contributor
Posts: 5

Re: Base SAS tricky logic

Hi,

 

I have swicthed to one record for each day. but I'm facing problem  in recombining them into intervals.

 

Could you please give me some sample code.

 

Thanks.

Solution
‎12-16-2016 08:37 AM
Super User
Posts: 6,982

Re: Base SAS tricky logic

Here is the complete code, from creating example data to final dataset:

data have;
input id beginning_date :ddmmyy10. end_date :ddmmyy10. product $;
format beginning_date end_date ddmmyyp10.;
cards;
1 01.01.2016 31.01.2016 A
1 10.01.2016 29.01.2016 B
1 15.01.2016 31.01.2016 C
;
run;

data int1;
set have;
do prod_date = beginning_date to end_date;
  output;
end;
format prod_date ddmmyyp10.;
keep id prod_date product;
run;

proc sort data=int1;
by id prod_date product;
run;

data int2;
set int1;
by id prod_date;
retain products;
length products $10;
if first.prod_date then products = " ";
products = trim(products) !! product;
if last.prod_date then output;
keep id prod_date products;
run;

data want;
set int2;
by id products notsorted;
retain beginning_date;
format beginning_date end_date ddmmyyp10.;
if first.products then beginning_date = prod_date;
if last.products
then do;
  end_date = prod_date;
  output;
end;
keep id beginning_date end_date products;
run;

proc print data=want noobs;
run;

The result:

                  beginning_
id    products       date         end_date

 1      A         01.01.2016    09.01.2016
 1      AB        10.01.2016    14.01.2016
 1      ABC       15.01.2016    29.01.2016
 1      AC        30.01.2016    31.01.2016
---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Occasional Contributor
Posts: 5

Re: Base SAS tricky logic

Hi Kurt,

 

Thanks alot.

 

This is very helpful.

Super User
Posts: 6,982

Re: Base SAS tricky logic

If you have to handle really large datasets, some thoughts

- reduce the length of the date variables; a length of 4 is sufficient.

- always keep the minimum data necessary. That's why I did not include the + signs in the "products" string. Such things can be added at the end for display purposes. Make that collector string as short as necessary.

- if you need to keep longer character variables, use the compress=yes option to reduce dataset size; if such datasets need to be sorted, using the tagsort option might make a sort feasible that crashes otherwise.

 

SAS is usually I/O intensive, so tuning a SAS server/computer is mostly done on the storage side.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Super User
Posts: 9,691

Re: Base SAS tricky logic

[ Edited ]

Of course,

I would not miss such kind of question.

 

 

 

data have;
input id beginning_date :ddmmyy10. end_date :ddmmyy10. product $;
format beginning_date end_date ddmmyyp10.;
cards;
1 01.01.2016 31.01.2016 A
1 10.01.2016 29.01.2016 B
1 15.01.2016 31.01.2016 C
;
run;
data temp(index=(product xx=(id date)));
 set have;
 do date=beginning_date to end_date;
  output;
 end;
 drop beginning_date end_date;
run;
proc sql;
create table x as
 select distinct product from have;
quit;
data _null_;
 set x end=last;
 if _n_=1 then call execute('data all;merge ');
 call execute('temp(where=(p_'||put(_n_,8.-l)||'="'||product||'") 
 rename=(product=p_'||put(_n_,8.-l)||'))');
 if last then call execute(';by id date;length f $ 200;f=catx("+",of p_:); run;');
run;

data want;
 set all;
 by id f notsorted;
 retain first;
 if first.f then first=date;
 if last.f then do;begin_date=first;end_date=date;output; end;
 format date Begin_date end_date date9.;
 drop date p_: first;
run;
☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 10 replies
  • 332 views
  • 0 likes
  • 4 in conversation