Counter by group with MAx(counter)

Accepted Solution Solved
Reply
Super Contributor
Posts: 418
Accepted Solution

Counter by group with MAx(counter)

For an answer given by Astounding (and slightly modified by me to get the counter variable as an additional varaible) please see my third post and not the attached correct answer specification.

I will post if anyone else comes up with other unique ways to answer this question!

Hello everyone. I have the following problem that I am trying to solve in ONE datastep. I am not actually sure if this is possible (I am learning towards it not being possible, alas I am still a SAS 'novice').

Basically I have a dataset with by groups that I need to create a counter variable for (I can do this), and then I need ANOTHER variable which stores the Maximum counter variable across the by groups. Please see my output dataset that I would like below for reference.

WANT

Key     Counter     Max Counter

1              1                    4

1              2                    4

1              3                    4

1              4                    4

2              1                    3

2              2                    3

2              3                    3

3              1                    1

I am aware that this can be done in two datasteps and then a merge can be done to get both columns. What I am curious of is if this can be done in one datastep to incorrect efficiency.

This is an issue because I need this on a very large dataset, and if I can avoid having to read it/ write it/ sort it multiple times it will save hours of efficiency gain.

If this cannot be done in one dataset as I fear, could anyone list the most efficient possible method for doing this?

Thanks so much for your time!

Brandon


Accepted Solutions
Solution
‎10-09-2013 03:00 PM
Super User
Posts: 5,071

Re: Counter by group with MAx(counter)

If you're a novice and can create COUNTER, I have to assume that your data set is sorted by KEY.  This would be the fastest solution I can think of:

data want;

  max_counter = 0;

  do until (last.key);

     set have (keep=key);

     by key;

     max_counter + 1;

  end;

  do until (last.key);

     set have;

     by key;

     output;

  end;

run;

It still reads the data twice, but it avoids having to merge or join which would be the most time-consuming part of the process.

Good luck.

View solution in original post


All Replies
Respected Advisor
Posts: 3,777

Re: Counter by group with MAx(counter)

Is the data sorted or indexed now? If so what?

Solution
‎10-09-2013 03:00 PM
Super User
Posts: 5,071

Re: Counter by group with MAx(counter)

If you're a novice and can create COUNTER, I have to assume that your data set is sorted by KEY.  This would be the fastest solution I can think of:

data want;

  max_counter = 0;

  do until (last.key);

     set have (keep=key);

     by key;

     max_counter + 1;

  end;

  do until (last.key);

     set have;

     by key;

     output;

  end;

run;

It still reads the data twice, but it avoids having to merge or join which would be the most time-consuming part of the process.

Good luck.

Super Contributor
Posts: 418

Re: Counter by group with MAx(counter)

Sorry let me re-specify. Most people would not consider me a novice but more of an intermediate - advanced user (I can program just about anything, I'm just not "Great" at determining the absolute fastest way to get it done. I've been using sas for ~2 and 1/4 years).

Also _null_, the incoming dataset will not be sorted or indexed at all, so I know I have to sort it before doing by variable grouping (unless an index is quicker).

Also astounding, thank you this is exactly what I am looking for. A solution that is the "most efficient" that can ignore the two table sort and merge (which takes FOREVER). I will do some testing with the code and let you know my results! I will likely mark your answer as the correct one (unless by using an index per _null_'s comment would be faster!).

Basically this data is being created from a X * 100 million + csv file, so no index or sort at any level in the beginning unfortunately.

Super Contributor
Posts: 418

Re: Counter by group with MAx(counter)

Hello astounding. I was able to take your code with the following modification (in bold) and get the desired output that I wanted...

data want;

  max_counter = 0;

  do until (last.key);

     set have (keep=key);

     by key;

     max_counter + 1;

  end;

  do until (last.key);

     set have;

     by key;

   Counter+1;

     if first.key then counter=1;

     output;

  end;

run;

This is exactly what I am looking for, and it does avoid the sort and merge. Thank you for your help! This is what I mean when I say I am a novice, the do until step before the set step would have never occured to me.

Thanks again!

Brandon

Super User
Posts: 5,071

Re: Counter by group with MAx(counter)

Good job to create COUNTER in this final step.

It would be marginally faster to switch the logic up just a bit.  Add this statement in between the two DO UNTIL loops:

counter=0;

Then you could remove the IF THEN statement inside the second loop.

It would be difficult to measure the difference, but you do have hundreds of millions of observations.

Good luck.

Super Contributor
Posts: 418

Re: Counter by group with MAx(counter)

Oh holy cow that's a great idea. This way the variable is set to zero whenever it gets to a new key start, and then it always adds one instead of going through the if then loop.

I wish I could upvote you twice, thanks for all of your help!

Brandon

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 6 replies
  • 247 views
  • 0 likes
  • 3 in conversation