Solved
Contributor
Posts: 51

# Count based on group variables

This is my Data :

DATA TEST ;
LENGTH A \$3. B E \$6. C \$3. ;
FORMAT D 12.4 ;
INFILE DATALINES DLM='~' ;
INPUT A B C D E ;
DATALINES ;
ABC~ITEMS3~USD~10~1234A1
DBC~ITEMS3~USD~20~1234A2
VBC~ITEMS3~USD~15~1234A3
BBC~ITEMS3~USD~10~1234A4
NBC~ITEMS3~ILS~04~1234A5
KBC~ITEMS3~ILS~09~1234A6
PBC~ITEMS6~USD~09~1234B1
VBC~ITEMS6~USD~09~1234B2
CBC~ITEMS6~EUR~09~1234B3
TBC~ITEMS6~EUR~09~1234B4

Now I want the output to be counted based on variable C and keeping in mind from the common B.

The output I want is :

B                 C                 E               seq_num

IITEMS3      USD      1234A1             1
ITEMS3       USD      1234A2             1
ITEMS3       USD      1234A3             1
ITEMS3       USD      1234A4             1
ITEMS3       ILS         1234A5             2
ITEMS3       ILS          1234A6            2
ITEMS6       USD        1234B1           1
ITEMS6       USD         1234B2          1
ITEMS6       EUR        1234B3            2
ITEMS6       EUR        1234B4           2

The seq_num as you can see is based on the count of variable C, I want it to be same if its repeating and change only ( by 1) only when a new value comes in the variable C. Note the father variable being B.

I am confused with how to use the FIRST. LAST. to get this output.

Please could you help me with this.

P.S. I need it urgently..

Thank you

Accepted Solutions
Solution
‎12-11-2017 02:24 PM
PROC Star
Posts: 1,294

## Re: Count based on group variables

[ Edited ]

Don't panic. Is this homework lol ? Reeza is helping you learn on your own by self learning and researching, which is only possible when you try. Anyways:

DATA have ;
LENGTH A \$3. B E \$6. C \$3. ;
FORMAT D 12.4 ;
INFILE DATALINES DLM='~' ;
INPUT A B C D E ;
DATALINES ;
ABC~ITEMS3~USD~10~1234A1
DBC~ITEMS3~USD~20~1234A2
VBC~ITEMS3~USD~15~1234A3
BBC~ITEMS3~USD~10~1234A4
NBC~ITEMS3~ILS~04~1234A5
KBC~ITEMS3~ILS~09~1234A6
PBC~ITEMS6~USD~09~1234B1
VBC~ITEMS6~USD~09~1234B2
CBC~ITEMS6~EUR~09~1234B3
TBC~ITEMS6~EUR~09~1234B4
;

proc sort data=have ;
by b c;
run;

data want;
set have;
by b c;
if first.b then call missing(seq_num);
if first.c then seq_num+1;
run;

All Replies
Contributor
Posts: 51

## Re: Count based on group variables

@data_null__   need your help on this one...

Super User
Posts: 22,820

## Re: Count based on group variables

https://stats.idre.ucla.edu/sas/faq/how-can-i-create-an-enumeration-variable-by-groups/

This tutorial will walk you through how to create your groups. If you have difficulties, please post the code you've attempted.

Contributor
Posts: 51

## Re: Count based on group variables

I tried but couldnt get the result.

Below is my code :

DATA TEST ;
LENGTH A \$3. B E \$6. C \$3. ;
FORMAT D 12.4 ;
INFILE DATALINES DLM='~' ;
INPUT A B C D E ;
DATALINES ;
ABC~ITEMS3~USD~10~1234A1
DBC~ITEMS3~USD~20~1234A2
VBC~ITEMS3~USD~15~1234A3
BBC~ITEMS3~USD~10~1234A4
NBC~ITEMS3~ILS~04~1234A5
KBC~ITEMS3~ILS~09~1234A6
PBC~ITEMS6~USD~09~1234B1
VBC~ITEMS6~USD~09~1234B2
CBC~ITEMS6~EUR~09~1234B3
TBC~ITEMS6~EUR~09~1234B4
;

PROC SORT DATA = TEST ; BY B C E ;
DATA XXX ;
SEQ = 0 ;
SET TEST ;
BY B C ;

IF FIRST.B
THEN DO ;
IF FIRST.C THEN SEQ = 1 ;
END;
IF LAST.C THEN
SEQ = SEQ + 1 ;
RUN ;

PROC PRINT DATA = XXX ;
VAR C B SEQ ;

Output :

Obs C        B        SEQ

1 ILS        ITEMS3 1
2 ILS        ITEMS3 1
3 USD      ITEMS3 0
4 USD      ITEMS3 0
5 USD       ITEMS3 0
6 USD      ITEMS3 1
7 EUR      ITEMS6 1
8 EUR      ITEMS6 1
9 USD      ITEMS6 0
10 USD     ITEMS6 1

whereas I want it to be :

Obs C        B        SEQ

1 ILS        ITEMS3 1
2 ILS        ITEMS3 1
3 USD      ITEMS3 2
4 USD      ITEMS3 2
5 USD       ITEMS3 2
6 USD      ITEMS3 2
7 EUR      ITEMS6 1
8 EUR      ITEMS6 1
9 USD      ITEMS6 2
10 USD     ITEMS6 2

Solution
‎12-11-2017 02:24 PM
PROC Star
Posts: 1,294

## Re: Count based on group variables

[ Edited ]

Don't panic. Is this homework lol ? Reeza is helping you learn on your own by self learning and researching, which is only possible when you try. Anyways:

DATA have ;
LENGTH A \$3. B E \$6. C \$3. ;
FORMAT D 12.4 ;
INFILE DATALINES DLM='~' ;
INPUT A B C D E ;
DATALINES ;
ABC~ITEMS3~USD~10~1234A1
DBC~ITEMS3~USD~20~1234A2
VBC~ITEMS3~USD~15~1234A3
BBC~ITEMS3~USD~10~1234A4
NBC~ITEMS3~ILS~04~1234A5
KBC~ITEMS3~ILS~09~1234A6
PBC~ITEMS6~USD~09~1234B1
VBC~ITEMS6~USD~09~1234B2
CBC~ITEMS6~EUR~09~1234B3
TBC~ITEMS6~EUR~09~1234B4
;

proc sort data=have ;
by b c;
run;

data want;
set have;
by b c;
if first.b then call missing(seq_num);
if first.c then seq_num+1;
run;

Contributor
Posts: 51

## Re: Count based on group variables

Thanks a ton.. But could you help me understand what is the :

call missing function doing.

PROC Star
Posts: 1,294

## Re: Count based on group variables

Call missing here assigns a missing value in order to override the self retain of the sum statement not to increment when by groups change.

Contributor
Posts: 51

Thanks

Super User
Posts: 6,530

## Re: Count based on group variables

You're fiddling with the right tools.  Here's a way that should work:

DATA XXX ;
SET TEST ;
BY B C ;

IF FIRST.B THEN SEQ = 1 ;
ELSE IF FIRST.C THEN SEQ + 1 ;
RUN ;

☑ This topic is solved.