DATA Step, Macro, Functions and more

If not SQL, then what?

Accepted Solution Solved
Reply
Frequent Contributor
Posts: 124
Accepted Solution

If not SQL, then what?

Greetings all.  I am trying to learn SAS programming, and I must say it has been very frustrating at times, but I'm starting to like it.  I've got sound SQL skills, so I found myself relying on proc sql as much as possible.  However, I feel that goes against the spirit of SAS, so I started trying to do things without SQL, and progress has been slow.  I've got one example I cannot even begin to fathom how to do without SQL, if anyone would be interested in giving it a whack.  My data are a company's billing cycle, and the number of bills that go out the door every day.  There are 21 billing cycles, and every customer account is put on one of them.  Every cycle has a schedule on which it will bill, and it is basically once per month, since bills don't go out on weekends.  Let's say today is the day for bill cycle 5 to send bills out.  I need to know how many bills went out the door on time, in addition to the number of bills that went out the door 1 to 4 days late.  A bill is late if it does not go out the door when it is supposed to.  So, if today is bill cycle 5, then all bills that should have gone out today, and did, are on time.  If a customer is on cycle 4, which should have gone out yesterday, but it went out today, then that bill is 1 day late.  I've attached a file that contains sample data, all the code, and a 'Final' result table that I make using proc sql.  Basically I'm just wondering how to go about producing dataset 'Final' without using any sql.  Any ideas would be greatly appreciated.

Greg

Attachment

Accepted Solutions
Solution
‎09-20-2012 03:26 PM
Respected Advisor
Posts: 3,156

Re: If not SQL, then what?

Seems to me 2XDOW will do the whack for ya,

data want;

  do until (last.DateCol);

   set test;

   by datecol notsorted;

   _ontime_cycle=ifn(ontime>= billcount, _ontime_cycle,cycle);

   ontime=ontime max BillCount;

  end;

  _One_day_late_cycle=ifn(_ontime_cycle-1>0,_ontime_cycle-1, 21-abs(_ontime_cycle-1));

  _two_days_late_cycle=ifn( _ontime_cycle-2>0,_ontime_cycle-2, 21-abs(_ontime_cycle-2));

  _three_days_late_cycle=ifn( _ontime_cycle-3>0,_ontime_cycle-3, 21-abs(_ontime_cycle-3));

  _four_days_late_cycle=ifn( _ontime_cycle-4>0,_ontime_cycle-4, 21-abs(_ontime_cycle-4));

   do until (last.DateCol);

   set test;

   by datecol notsorted;

     if cycle=_ontime_cycle then ontime=billcount;

       else if cycle=_One_day_late_cycle then one_day_late=billcount;

       else if cycle=_two_days_late_cycle then two_days_late=billcount;

       else if cycle=_three_days_late_cycle then three_days_late=billcount;

       else if cycle=_four_days_late_cycle then four_days_late=billcount;

      end;

      drop _: billcount cycle;

run;

Of course, there are other data step approaches as well, such as stacking, hash() etc.

Haikuo

View solution in original post


All Replies
Solution
‎09-20-2012 03:26 PM
Respected Advisor
Posts: 3,156

Re: If not SQL, then what?

Seems to me 2XDOW will do the whack for ya,

data want;

  do until (last.DateCol);

   set test;

   by datecol notsorted;

   _ontime_cycle=ifn(ontime>= billcount, _ontime_cycle,cycle);

   ontime=ontime max BillCount;

  end;

  _One_day_late_cycle=ifn(_ontime_cycle-1>0,_ontime_cycle-1, 21-abs(_ontime_cycle-1));

  _two_days_late_cycle=ifn( _ontime_cycle-2>0,_ontime_cycle-2, 21-abs(_ontime_cycle-2));

  _three_days_late_cycle=ifn( _ontime_cycle-3>0,_ontime_cycle-3, 21-abs(_ontime_cycle-3));

  _four_days_late_cycle=ifn( _ontime_cycle-4>0,_ontime_cycle-4, 21-abs(_ontime_cycle-4));

   do until (last.DateCol);

   set test;

   by datecol notsorted;

     if cycle=_ontime_cycle then ontime=billcount;

       else if cycle=_One_day_late_cycle then one_day_late=billcount;

       else if cycle=_two_days_late_cycle then two_days_late=billcount;

       else if cycle=_three_days_late_cycle then three_days_late=billcount;

       else if cycle=_four_days_late_cycle then four_days_late=billcount;

      end;

      drop _: billcount cycle;

run;

Of course, there are other data step approaches as well, such as stacking, hash() etc.

Haikuo

PROC Star
Posts: 7,474

Re: If not SQL, then what?

Another approach might be:

data want (keep=DateCol Ontime--Four_days_late);

  set test;

  by datecol;

  array dates(21);

  array final(5);

  retain dates;

  dates(cycle)=billcount;

  days=input(substr(strip(datecol),7,2),8.)+2;

  if last.datecol then do;

    do i=1 to 5;

      days=ifn(days-1 eq 0,21,days-1);

      final(i)=dates(days);

    end;

    Ontime=final(1);

    One_day_late=final(2);

    Two_days_late=final(3);

    Three_days_late=final(4);

    Four_days_late=final(5);

    output;

  end;

run;

Frequent Contributor
Posts: 124

Re: If not SQL, then what?

WOW!  What can I say, other than I am floored at how easy you both made that look.  While messing with it I had probaly 200 lines of code that only got me about half way there.  My hat goes off to you, and you have inspired me to keep up the resistance to old habits.  Thank you both so, so much for your effort.

Greg

PROC Star
Posts: 7,474

Re: If not SQL, then what?

There are some things that can be done better/more easily with sql, however datastep processing is definitely more suited for a number of tasks.  One should, me thinks, become adept at both.

BTW: I have absolutely no problem with your having marked Haikuo's solution as being correct, but if you have a large file, I would suggest actually using the code I suggested.  Haikuo's code requires reading the data twice, while the code I proposed, only makes a single pass through the data.  In short, it will likely take just half the time to complete the process.

Frequent Contributor
Posts: 124

Re: If not SQL, then what?

Arthur, your solution seems so simple once I started looking at it, and it all makes sense except for one line.   What is this doing...

days=input(substr(strip(datecol),7,2),8.)+2;

Greg


PROC Star
Posts: 7,474

Re: If not SQL, then what?

Sorry, I was updating my post, when you sent your question.

I took the last two characters from your character string 'datecol' and created a variable that was equal to that number+2;

For me, is was just a shortcut method of identifying where to start grabbing the desired data.

i.e., days=  the value of the last two characters of 'datecol' (after removing any leading or trailing spaces), and adding the digit 2 to that number, using the input function to convert the result into a number.

Frequent Contributor
Posts: 124

Re: If not SQL, then what?

Hai.kuo, what is 2XDOW, and I can't understand how the below line work...

ontime=ontime max BillCount;

I can't get how the max in the middle is working.  Is it saying to keep the value of ontime unless it encounters a greater value of BillCount?

Greg


PROC Star
Posts: 7,474

Re: If not SQL, then what?

Regarding your question to Haikuo, take a look at:

http://support.sas.com/resources/papers/proceedings09/038-2009.pdf

It is simply a way to do things with a datastep above and beyond what one might pick up from the documentation.  Paul, the author of the paper I referenced, is one of three people who are credited with coming up with the idea.  It is commonly known as the Dorfman/Whitlock do loop or DOW for short.

Quite often, you can do something with a DOW loop, making 2 or 3 passes through the data, but without having to create additional files, or having to merge any files, to obtain a desired result.  It is a VERY useful technique to learn.

Respected Advisor
Posts: 3,156

Re: If not SQL, then what?

Thanks, Art, for answering for me.

OP, There is one thing I 'd like to address. There is a big difference between my approach and Art's. Mine was hinted by your code, first to check the largest billcount and use it as ontime. Art's approach, in my opinion, is more reasonable. He was using the date.

And Art's comment about performance is correct. However, if you are planning to do it my way, here is an improved version using array(), only one pass I/O:

data want;

array c(1:21) _temporary_;

   do _n_=1 by 1 until (last.DateCol);

   set test ;

   by datecol cycle;

   c(_n_)=billcount;

   _0=ifn(ontime>= billcount, _0,cycle);

   ontime=ontime max billcount;

   end;

  _1=ifn(_0-1>0,_0-1, 21-abs(_0-1));

  _2=ifn( _0-2>0,_0-2, 21-abs(_0-2));

  _3=ifn(_0-3>0,_0-3, 21-abs(_0-3));

  _4=ifn( _0-4>0,_0-4, 21-abs(_0-4));

ontime=c(_0);one_day_late=c(_1);two_days_late=c(_2);three_days_late=c(_3);four_days_late=c(_4);

drop _: cycle;

run;

Haikuo

PROC Star
Posts: 7,474

Re: If not SQL, then what?

I liked Haikuo's revised version but, since he simplified his code, I'll do the same:

data want (keep=DateCol Ontime--Four_days_late);

  set test;

  by datecol;

  array dates(21);

  array final(5) Ontime One_day_late Two_days_late Three_days_late Four_days_late;

  retain dates;

  dates(cycle)=billcount;

  days=input(substr(strip(datecol),7,2),8.)+2;

  if last.datecol then do;

    do i=1 to 5;

      days=ifn(days-1 eq 0,21,days-1);

      final(i)=dates(days);

    end;

    output;

  end;

run;

Frequent Contributor
Posts: 124

Re: If not SQL, then what?

Thank you both so, so much for giving my issue such great care.  It will take some time for each solution to sink in, but I am eager to dive into it tomorrow.  I have a couple of questions if you would be so kind.

Arthur: It seems that days is an arbitrary number based on the date in datecol, but I can't get what it is doing.  Why do you have to add 2?

Haikuo:  in your drop statement, you have '_:'  Does the colon after the underscore tell the processor to drop all variables starting with an underscore?

Thanks again, I am truly humbled.

Greg

PROC Star
Posts: 7,474

Re: If not SQL, then what?

Greg,

Here is a revised (one line changed) version of the code, that shows the value of days throughout the iterations.

I was simply looking for an algorithm that matched the days you had indicated were those needed to obtain the desired solution.

data want (keep=DateCol Ontime--Four_days_late);

  set test;

  by datecol;

  array dates(21);

  array final(5) Ontime One_day_late Two_days_late Three_days_late Four_days_late;

  retain dates;

  dates(cycle)=billcount;

  days=input(substr(strip(datecol),7,2),8.)+2;

  if last.datecol then do;

    do i=1 to 5;

      days=ifn(days-1 eq 0,21,days-1);

      final(i)=days;

/*      final(i)=dates(days);*/

    end;

    output;

  end;

run;

Respected Advisor
Posts: 3,156

Re: If not SQL, then what?

Greg,

Your guess is correct. For more information on colon (Smiley Happy, check this paper out:

http://www2.sas.com/proceedings/sugi26/p073-26.pdf

Haikuo

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 13 replies
  • 334 views
  • 6 likes
  • 3 in conversation