Apply count variable to all observations

Reply
N/A
Posts: 1

Apply count variable to all observations

Hi,

I am trying to calculate how many times different suppliers have provided a name and address.

What I need is for the value of the supplied variable for last.UID to be applied to all observations of that UID so that when I later class by each supplier I get an accurate count of the total time that record has been supplied.

Current code:

data new;

     set supplied1;

     by uid;

          if first.uid then do;

               supplied=0;

          end;

     supplied=supplied+1;

retain supplied;

if last.uid;

run;

I can do this in a number of steps but I have a feeling there is a much simpler way of doing it!

Thanks

PROC Star
Posts: 7,471

Re: Apply count variable to all observations

I think you will have to make two passes through the data, even if only one datastep is used.  One solution:

data new;

  do until (last.uid);

    set supplied1;

    by uid;

    if first.uid then supplied=1;

    else supplied=supplied+1;

  end;

  do until (last.uid);

    set supplied1;

    by uid;

    output;

  end;

run;

Valued Guide
Posts: 2,177

Re: Apply count variable to all observations

There are merits in this approach. Although there appears to be 2 passes through the data because there are two set statemets, the data flow switches from one stream to the other on change of by group. If the buffering and caching of the data holds more than one by group there will be only one I/O stream into the buffers for this data step.

It has been presented before in forums and sas-L - using two references to to same dataset in a single SET statement like :

Data final ;

SET supplied1( in= A ) supplied1 ;

By uid ;

If first.uid then supplied=0;

If a then  supplied +1 ;

else output;

Run;

PROC Star
Posts: 7,471

Re: Apply count variable to all observations

: Don't know how I missed that but, in the tests I just ran, both approaches took approximately the same amounts of CPU and real times, thus I'm not convinced that they don't both take two passes through the data. However, that said, I like the approach you suggested as it does the same thing with less code.  The only thing I'd suggest changing is the if then else combinations. i.e.:

data new;

  set supplied1( in= a ) supplied1 ;

  by uid ;

  if first.uid then supplied=1;

  else if a then supplied +1 ;

  else output;

run;

Super Contributor
Posts: 1,636

Re: Apply count variable to all observations

shorter:

data have;

input id@@;

cards;

1 2 2 3 3 3 9 9 9 9 18 18 18 18 18

;

data want;

set have(in=a)have;

by id ;

if a then t+1-first.id*t;

else output;

run;

proc print;run;

PROC Star
Posts: 7,471

Re: Apply count variable to all observations

: Actually, on average, your suggestion runs slightly slower, but the two approaches do run in very similar times.  I prefer the if-then-else combinations as it is easier (I think) for most to follow the logic.  The fact that it run slightly faster (and I mean really, really only slightly), I think, is just an added benefit.

Super Contributor
Posts: 1,636

Re: Apply count variable to all observations

Hi Art,

You are right. Is there any leftover turkey to share?

PROC Star
Posts: 7,471

Re: Apply count variable to all observations

@Linlin: Only if you get here quickly and now is being served in the form of Sheppard's Pie

Super User
Posts: 5,503

Re: Apply count variable to all observations

You won't measure the difference in a one-variable data set, but it would be good practice to modify the SET statement:

set supplied1 (in=a keep=uid) supplied1;


PROC Star
Posts: 7,471

Re: Apply count variable to all observations

Posted in reply to Astounding

: Interestingly, the efficiency doesn't appear to apply in this particular situation.  The test code I ran was:

data supplied1;

  set sashelp.class (rename=(age=uid));

  array junk(999) (999*1);

  do _n_=1 to 10000;

    output;

  end;

run;

proc sort data=supplied1;

  by uid;

run;

data new;

  set supplied1( in= a ) supplied1 ;

  by uid ;

  if first.uid then supplied=0;

  if a then  supplied +1 ;

  else output;

run;

data new;

  set supplied1( in= a keep=uid) supplied1 ;

  by uid ;

  if first.uid then supplied=0;

  if a then  supplied +1 ;

  else output;

run;

Super User
Posts: 5,503

Re: Apply count variable to all observations

Art,

Hmmmm ... there are mysteries that can be difficult to explain.  By any chance did you measure the difference with your original DOW approach?

PROC Star
Posts: 7,471

Re: Apply count variable to all observations

Posted in reply to Astounding

: of course I looked at that as well.  Even with the double DOW, the pdv is populated with the variables from both steps thus using a keep option doesn't improve the processing times.

If any of you are wondering why is surprised, take a look at: Increase Your Productivity by Doing Less - sasCommunity

Respected Advisor
Posts: 3,156

Re: Apply count variable to all observations

Art's paper reminds me that one of the co-authors, Ksharp, has dropped out of the grid for a long time. Let's wish him the best!

Haikuo

PROC Star
Posts: 7,471

Re: Apply count variable to all observations

: KSharp is still alive and well and, every now and then, sends me an idea for a SAS-related paper that he'd like me to investigate.

PROC Star
Posts: 1,760

Re: Apply count variable to all observations

Posted in reply to Astounding

For the sake of code clarity,

set SUPPLIED1(in=a keep=uid) SUPPLIED1;

is indeed better imho, regardless of performance.

This way we have more information about what each table is used for.

We could even write:

set SUPPLIED1(in=groupingstream keep=uid) SUPPLIED1;

or

set SUPPLIED1(in=groupingstream keep=uid) SUPPLIED1(in=outputstream);

if we were really petty about descriptive names, which we are not.

Ask a Question
Discussion stats
  • 18 replies
  • 564 views
  • 7 likes
  • 8 in conversation