BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
gsnidow
Obsidian | Level 7

Greetings all.  I am trying to learn SAS programming, and I must say it has been very frustrating at times, but I'm starting to like it.  I've got sound SQL skills, so I found myself relying on proc sql as much as possible.  However, I feel that goes against the spirit of SAS, so I started trying to do things without SQL, and progress has been slow.  I've got one example I cannot even begin to fathom how to do without SQL, if anyone would be interested in giving it a whack.  My data are a company's billing cycle, and the number of bills that go out the door every day.  There are 21 billing cycles, and every customer account is put on one of them.  Every cycle has a schedule on which it will bill, and it is basically once per month, since bills don't go out on weekends.  Let's say today is the day for bill cycle 5 to send bills out.  I need to know how many bills went out the door on time, in addition to the number of bills that went out the door 1 to 4 days late.  A bill is late if it does not go out the door when it is supposed to.  So, if today is bill cycle 5, then all bills that should have gone out today, and did, are on time.  If a customer is on cycle 4, which should have gone out yesterday, but it went out today, then that bill is 1 day late.  I've attached a file that contains sample data, all the code, and a 'Final' result table that I make using proc sql.  Basically I'm just wondering how to go about producing dataset 'Final' without using any sql.  Any ideas would be greatly appreciated.

Greg

1 ACCEPTED SOLUTION

Accepted Solutions
Haikuo
Onyx | Level 15

Seems to me 2XDOW will do the whack for ya,

data want;

  do until (last.DateCol);

   set test;

   by datecol notsorted;

   _ontime_cycle=ifn(ontime>= billcount, _ontime_cycle,cycle);

   ontime=ontime max BillCount;

  end;

  _One_day_late_cycle=ifn(_ontime_cycle-1>0,_ontime_cycle-1, 21-abs(_ontime_cycle-1));

  _two_days_late_cycle=ifn( _ontime_cycle-2>0,_ontime_cycle-2, 21-abs(_ontime_cycle-2));

  _three_days_late_cycle=ifn( _ontime_cycle-3>0,_ontime_cycle-3, 21-abs(_ontime_cycle-3));

  _four_days_late_cycle=ifn( _ontime_cycle-4>0,_ontime_cycle-4, 21-abs(_ontime_cycle-4));

   do until (last.DateCol);

   set test;

   by datecol notsorted;

     if cycle=_ontime_cycle then ontime=billcount;

       else if cycle=_One_day_late_cycle then one_day_late=billcount;

       else if cycle=_two_days_late_cycle then two_days_late=billcount;

       else if cycle=_three_days_late_cycle then three_days_late=billcount;

       else if cycle=_four_days_late_cycle then four_days_late=billcount;

      end;

      drop _: billcount cycle;

run;

Of course, there are other data step approaches as well, such as stacking, hash() etc.

Haikuo

View solution in original post

13 REPLIES 13
Haikuo
Onyx | Level 15

Seems to me 2XDOW will do the whack for ya,

data want;

  do until (last.DateCol);

   set test;

   by datecol notsorted;

   _ontime_cycle=ifn(ontime>= billcount, _ontime_cycle,cycle);

   ontime=ontime max BillCount;

  end;

  _One_day_late_cycle=ifn(_ontime_cycle-1>0,_ontime_cycle-1, 21-abs(_ontime_cycle-1));

  _two_days_late_cycle=ifn( _ontime_cycle-2>0,_ontime_cycle-2, 21-abs(_ontime_cycle-2));

  _three_days_late_cycle=ifn( _ontime_cycle-3>0,_ontime_cycle-3, 21-abs(_ontime_cycle-3));

  _four_days_late_cycle=ifn( _ontime_cycle-4>0,_ontime_cycle-4, 21-abs(_ontime_cycle-4));

   do until (last.DateCol);

   set test;

   by datecol notsorted;

     if cycle=_ontime_cycle then ontime=billcount;

       else if cycle=_One_day_late_cycle then one_day_late=billcount;

       else if cycle=_two_days_late_cycle then two_days_late=billcount;

       else if cycle=_three_days_late_cycle then three_days_late=billcount;

       else if cycle=_four_days_late_cycle then four_days_late=billcount;

      end;

      drop _: billcount cycle;

run;

Of course, there are other data step approaches as well, such as stacking, hash() etc.

Haikuo

art297
Opal | Level 21

Another approach might be:

data want (keep=DateCol Ontime--Four_days_late);

  set test;

  by datecol;

  array dates(21);

  array final(5);

  retain dates;

  dates(cycle)=billcount;

  days=input(substr(strip(datecol),7,2),8.)+2;

  if last.datecol then do;

    do i=1 to 5;

      days=ifn(days-1 eq 0,21,days-1);

      final(i)=dates(days);

    end;

    Ontime=final(1);

    One_day_late=final(2);

    Two_days_late=final(3);

    Three_days_late=final(4);

    Four_days_late=final(5);

    output;

  end;

run;

gsnidow
Obsidian | Level 7

WOW!  What can I say, other than I am floored at how easy you both made that look.  While messing with it I had probaly 200 lines of code that only got me about half way there.  My hat goes off to you, and you have inspired me to keep up the resistance to old habits.  Thank you both so, so much for your effort.

Greg

art297
Opal | Level 21

There are some things that can be done better/more easily with sql, however datastep processing is definitely more suited for a number of tasks.  One should, me thinks, become adept at both.

BTW: I have absolutely no problem with your having marked Haikuo's solution as being correct, but if you have a large file, I would suggest actually using the code I suggested.  Haikuo's code requires reading the data twice, while the code I proposed, only makes a single pass through the data.  In short, it will likely take just half the time to complete the process.

gsnidow
Obsidian | Level 7

Arthur, your solution seems so simple once I started looking at it, and it all makes sense except for one line.   What is this doing...

days=input(substr(strip(datecol),7,2),8.)+2;

Greg


art297
Opal | Level 21

Sorry, I was updating my post, when you sent your question.

I took the last two characters from your character string 'datecol' and created a variable that was equal to that number+2;

For me, is was just a shortcut method of identifying where to start grabbing the desired data.

i.e., days=  the value of the last two characters of 'datecol' (after removing any leading or trailing spaces), and adding the digit 2 to that number, using the input function to convert the result into a number.

gsnidow
Obsidian | Level 7

Hai.kuo, what is 2XDOW, and I can't understand how the below line work...

ontime=ontime max BillCount;

I can't get how the max in the middle is working.  Is it saying to keep the value of ontime unless it encounters a greater value of BillCount?

Greg


art297
Opal | Level 21

Regarding your question to Haikuo, take a look at:

http://support.sas.com/resources/papers/proceedings09/038-2009.pdf

It is simply a way to do things with a datastep above and beyond what one might pick up from the documentation.  Paul, the author of the paper I referenced, is one of three people who are credited with coming up with the idea.  It is commonly known as the Dorfman/Whitlock do loop or DOW for short.

Quite often, you can do something with a DOW loop, making 2 or 3 passes through the data, but without having to create additional files, or having to merge any files, to obtain a desired result.  It is a VERY useful technique to learn.

Haikuo
Onyx | Level 15

Thanks, Art, for answering for me.

OP, There is one thing I 'd like to address. There is a big difference between my approach and Art's. Mine was hinted by your code, first to check the largest billcount and use it as ontime. Art's approach, in my opinion, is more reasonable. He was using the date.

And Art's comment about performance is correct. However, if you are planning to do it my way, here is an improved version using array(), only one pass I/O:

data want;

array c(1:21) _temporary_;

   do _n_=1 by 1 until (last.DateCol);

   set test ;

   by datecol cycle;

   c(_n_)=billcount;

   _0=ifn(ontime>= billcount, _0,cycle);

   ontime=ontime max billcount;

   end;

  _1=ifn(_0-1>0,_0-1, 21-abs(_0-1));

  _2=ifn( _0-2>0,_0-2, 21-abs(_0-2));

  _3=ifn(_0-3>0,_0-3, 21-abs(_0-3));

  _4=ifn( _0-4>0,_0-4, 21-abs(_0-4));

ontime=c(_0);one_day_late=c(_1);two_days_late=c(_2);three_days_late=c(_3);four_days_late=c(_4);

drop _: cycle;

run;

Haikuo

art297
Opal | Level 21

I liked Haikuo's revised version but, since he simplified his code, I'll do the same:

data want (keep=DateCol Ontime--Four_days_late);

  set test;

  by datecol;

  array dates(21);

  array final(5) Ontime One_day_late Two_days_late Three_days_late Four_days_late;

  retain dates;

  dates(cycle)=billcount;

  days=input(substr(strip(datecol),7,2),8.)+2;

  if last.datecol then do;

    do i=1 to 5;

      days=ifn(days-1 eq 0,21,days-1);

      final(i)=dates(days);

    end;

    output;

  end;

run;

gsnidow
Obsidian | Level 7

Thank you both so, so much for giving my issue such great care.  It will take some time for each solution to sink in, but I am eager to dive into it tomorrow.  I have a couple of questions if you would be so kind.

Arthur: It seems that days is an arbitrary number based on the date in datecol, but I can't get what it is doing.  Why do you have to add 2?

Haikuo:  in your drop statement, you have '_:'  Does the colon after the underscore tell the processor to drop all variables starting with an underscore?

Thanks again, I am truly humbled.

Greg

art297
Opal | Level 21

Greg,

Here is a revised (one line changed) version of the code, that shows the value of days throughout the iterations.

I was simply looking for an algorithm that matched the days you had indicated were those needed to obtain the desired solution.

data want (keep=DateCol Ontime--Four_days_late);

  set test;

  by datecol;

  array dates(21);

  array final(5) Ontime One_day_late Two_days_late Three_days_late Four_days_late;

  retain dates;

  dates(cycle)=billcount;

  days=input(substr(strip(datecol),7,2),8.)+2;

  if last.datecol then do;

    do i=1 to 5;

      days=ifn(days-1 eq 0,21,days-1);

      final(i)=days;

/*      final(i)=dates(days);*/

    end;

    output;

  end;

run;

Haikuo
Onyx | Level 15

Greg,

Your guess is correct. For more information on colon (:), check this paper out:

http://www2.sas.com/proceedings/sugi26/p073-26.pdf

Haikuo

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 13 replies
  • 1236 views
  • 6 likes
  • 3 in conversation