Solved: Re: how to assign sequence number group wise with every five records

thanikondharish · Posted 07-10-2023 08:52 AM

/**how to assign sequence number every 5 records with group wise(sex, descending age)**/

name sex age height weight seq
Janet F 15 62.5 112.5 1
Mary F 15 66.5 112 1
Carol F 14 62.8 102.5 1
Judy F 14 64.3 90 1
Alice F 13 56.5 84 1
Barbara F 13 65.3 98 2
Jane F 12 59.8 84.5 2
Louise F 12 56.3 77 2
Joyce F 11 51.3 50.5 2
Philip M 16 72 150 2
Ronald M 15 67 133 3
William M 15 66.5 112 3
Alfred M 14 69 112.5 3
Henry M 14 63.5 102.5 3
Jeffrey M 13 62.5 84 3
James M 12 57.3 83 4
John M 12 59 99.5 4
Robert M 12 64.8 128 4
Thomas M 11 57.5 85 4

I want to assign sequence number group wise sex and descending age wise with every 5 records if same group display in different sequence number have to forward to next sequence number .
output should be like:

name sex age height weight seq
Janet F 15 62.5 112.5 1
Mary F 15 66.5 112 1
Carol F 14 62.8 102.5 1
Judy F 14 64.3 90 1
Alice F 13 56.5 84 2
Barbara F 13 65.3 98 2
Jane F 12 59.8 84.5 2
Louise F 12 56.3 77 2
Joyce F 11 51.3 50.5 2
Philip M 16 72 150 3
Ronald M 15 67 133 3
William M 15 66.5 112 3
Alfred M 14 69 112.5 3
Henry M 14 63.5 102.5 3
Jeffrey M 13 62.5 84 4
James M 12 57.3 83 4
John M 12 59 99.5 4
Robert M 12 64.8 128 4
Thomas M 11 57.5 85 4

how to do do while/do until/do loop concept?

Tom · Posted 07-10-2023 09:53 AM

Let me see if I can translate that so you can confirm your intention.

You have the records in SEX*AGE groups. Put those groups into combinations of no more than 5 observations. In this example the first 5 records include 3 groups of 2 member each. So you just want the first 4 to be in the first new group (seq).

This raises a big question: What do you do when one of the groups has more than 5 members already? Does that new grouping (seq) include more than 5 in that case? Or does the group get split into across two (or more) of the new SEQ groupings?

Let's assume you want any large group to be its own seq.


data want;
  do n=1 by 1 until(last.age);
    set have;
    by sex age notsorted;
  end;
  if (total+n) > 5 or _n_=1 then do; 
     seq+1;
     total=n;
  end;
  else total+n;
  do n=1 to n ;
    set have;
    output;
  end;
run;

Result

Obs    n    name       sex    age    height    weight    seq_want    total    seq

  1    1    Janet       F      15     62.5      112.5        1         2       1
  2    2    Mary        F      15     66.5      112.0        1         2       1
  3    1    Carol       F      14     62.8      102.5        1         4       1
  4    2    Judy        F      14     64.3       90.0        1         4       1
  5    1    Alice       F      13     56.5       84.0        2         2       2
  6    2    Barbara     F      13     65.3       98.0        2         2       2
  7    1    Jane        F      12     59.8       84.5        2         4       2
  8    2    Louise      F      12     56.3       77.0        2         4       2
  9    1    Joyce       F      11     51.3       50.5        2         5       2
 10    1    Philip      M      16     72.0      150.0        3         1       3
 11    1    Ronald      M      15     67.0      133.0        3         3       3
 12    2    William     M      15     66.5      112.0        3         3       3
 13    1    Alfred      M      14     69.0      112.5        3         5       3
 14    2    Henry       M      14     63.5      102.5        3         5       3
 15    1    Jeffrey     M      13     62.5       84.0        4         1       4
 16    1    James       M      12     57.3       83.0        4         4       4
 17    2    John        M      12     59.0       99.5        4         4       4
 18    3    Robert      M      12     64.8      128.0        4         4       4
 19    1    Thomas      M      11     57.5       85.0        4         5       4

View solution in original post

Tom · Posted 07-10-2023 09:02 AM

You could just use arithmetic.

data want;
  set have;
  seq = 1 + int((_n_-1)/5);
run;

Or just count.

data want;
   seq+1;
   do subseq=1 to 5;
     set have;
     output;
   end;
   drop subseq;
run;

thanikondharish · Posted 07-10-2023 09:23 AM

Thank you for your quick response but this code not reaching for my requirement .
The age group should be same sequence number with in sex group

Please find screen shot I have set the data in sex, descending order after that if i apply the your code/logic it's giving every five records one number but I need
if same group split into two sequence number the first record/first few records forward to sequence number.

for example:

5th and 6th records are same sex and age groups but sequence numbers are wrong . Here 5th record sequence number should be 2nd

Tom · Posted 07-10-2023 09:34 AM

I cannot understand what your restriction is.

Do you want to COUNT the existing groups?

Or do you want to CREATE the groups?

Or is it some combination of the two?

Your original example just set the first 5 observations to SEQ=1, the next 5 to SEQ=2 etc. So it was creating groups by simply assigning the first 5 to the first group, etc.

If instead you want to NUMBER the groups that exist then just use BY group processing.

For example this will generate a new SEQ number of for each unique SEX*AGE combination.

data want;
  set have;
  by sex age;
  seq + first.age;
run;

If the data is already grouped, but not necessarily sorted then use the NOTSORTED keyword on the BY statement.

Or perhaps you want to split the groups into subgroups of 5 in a row by restarting the seq numbers from one when a new group starts?

data want;
   seq+1;
   do subseq=1 to 5 until(last.age);
        set have ;
        by sex age;
        output;
   end;
   if last.sex then seq=0;
run;

Please share data as text, not photographs. Preferable as data steps that can be used to recreate the data.

thanikondharish · Posted 07-10-2023 09:44 AM

simple one like what I am trying to say we need to generate sequence number every 5 records .

5 record(nothing but seq=1) = 6th record(seq=2) both are (5th,6th records) same unique sex*age groups . So here 5th record also sequence number should be '2'

Outpu should be like(see below output):

name sex age height weight seq
Janet F 15 62.5 112.5 1
Mary F 15 66.5 112 1
Carol F 14 62.8 102.5 1
Judy F 14 64.3 90 1
Alice F 13 56.5 84 2
Barbara F 13 65.3 98 2
Jane F 12 59.8 84.5 2
Louise F 12 56.3 77 2
Joyce F 11 51.3 50.5 2
Philip M 16 72 150 3
Ronald M 15 67 133 3
William M 15 66.5 112 3
Alfred M 14 69 112.5 3
Henry M 14 63.5 102.5 3
Jeffrey M 13 62.5 84 4
James M 12 57.3 83 4
John M 12 59 99.5 4
Robert M 12 64.8 128 4
Thomas M 11 57.5 85 4

Tom · Posted 07-10-2023 09:53 AM

Let me see if I can translate that so you can confirm your intention.

You have the records in SEX*AGE groups. Put those groups into combinations of no more than 5 observations. In this example the first 5 records include 3 groups of 2 member each. So you just want the first 4 to be in the first new group (seq).

This raises a big question: What do you do when one of the groups has more than 5 members already? Does that new grouping (seq) include more than 5 in that case? Or does the group get split into across two (or more) of the new SEQ groupings?

Let's assume you want any large group to be its own seq.


data want;
  do n=1 by 1 until(last.age);
    set have;
    by sex age notsorted;
  end;
  if (total+n) > 5 or _n_=1 then do; 
     seq+1;
     total=n;
  end;
  else total+n;
  do n=1 to n ;
    set have;
    output;
  end;
run;

Result

Obs    n    name       sex    age    height    weight    seq_want    total    seq

  1    1    Janet       F      15     62.5      112.5        1         2       1
  2    2    Mary        F      15     66.5      112.0        1         2       1
  3    1    Carol       F      14     62.8      102.5        1         4       1
  4    2    Judy        F      14     64.3       90.0        1         4       1
  5    1    Alice       F      13     56.5       84.0        2         2       2
  6    2    Barbara     F      13     65.3       98.0        2         2       2
  7    1    Jane        F      12     59.8       84.5        2         4       2
  8    2    Louise      F      12     56.3       77.0        2         4       2
  9    1    Joyce       F      11     51.3       50.5        2         5       2
 10    1    Philip      M      16     72.0      150.0        3         1       3
 11    1    Ronald      M      15     67.0      133.0        3         3       3
 12    2    William     M      15     66.5      112.0        3         3       3
 13    1    Alfred      M      14     69.0      112.5        3         5       3
 14    2    Henry       M      14     63.5      102.5        3         5       3
 15    1    Jeffrey     M      13     62.5       84.0        4         1       4
 16    1    James       M      12     57.3       83.0        4         4       4
 17    2    John        M      12     59.0       99.5        4         4       4
 18    3    Robert      M      12     64.8      128.0        4         4       4
 19    1    Thomas      M      11     57.5       85.0        4         5       4

thanikondharish · Posted 07-10-2023 11:49 AM

Could you please explain me how the code/program works?
Could you please explain me step by step?

Tom · Posted 07-10-2023 11:51 AM

Count how many observations are in the group.

Does adding that many observations make the size of the current SEQ group exceed the limit? If so start a new SEQ group.

Re-read the observations in the group and write them back out so that all of the observations in the group have the new SEQ variable.

It is just a twist on the basic double DOW loop. https://www.google.com/search?q=%40sas.com+dow+loop

Kurt_Bremser · Posted 07-10-2023 09:34 AM

In your example:

name sex age height weight seq
Janet F 15 62.5 112.5 1
Mary F 15 66.5 112 1
Carol F 14 62.8 102.5 1
Judy F 14 64.3 90 1
Alice F 13 56.5 84 2
Barbara F 13 65.3 98 2
Jane F 12 59.8 84.5 2
Louise F 12 56.3 77 2
Joyce F 11 51.3 50.5 2

you change to seq=2 after only 4 observations, but then keep seq=2 for 5 observations. So do you increment with the 5th, or after 5?

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

thanikondharish · Posted 07-10-2023 10:11 AM

Thank you for your response.
If unique groups having more than 5 records for example 8 records here I consider every 8 observations with unique groups sequence number.

5 is just sample/example

thanikondharish · Posted 07-10-2023 10:16 AM

As of now below code is working.

Thank you.

data want;
  do n=1 by 1 until(last.age);
    set have;
    by sex age notsorted;
  end;
  if (total+n) > 5 or _n_=1 then do; 
     seq+1;
     total=n;
  end;
  else total+n;
  do n=1 to n ;
    set have;
    output;
  end;
run;

Tom · Posted 07-10-2023 10:24 AM

@thanikondharish wrote:

Thank you for your response.
If unique groups having more than 5 records for example 8 records here I consider every 8 observations with unique groups sequence number.

5 is just sample/example

In that case you need to process the dataset twice. Once to find the maximum group size. The second to make the new groups.

But why did you use 5 for your example instead of the actual maximum of 3?

Registration is open

SAS Training: Just a Click Away