DATA Step, Macro, Functions and more

Count based on group variables

Accepted Solution Solved
Reply
Contributor
Posts: 54
Accepted Solution

Count based on group variables

This is my Data :

 

DATA TEST ; 
LENGTH A $3. B E $6. C $3. ; 
FORMAT D 12.4 ; 
INFILE DATALINES DLM='~' ; 
INPUT A B C D E ; 
DATALINES ; 
ABC~ITEMS3~USD~10~1234A1 
DBC~ITEMS3~USD~20~1234A2 
VBC~ITEMS3~USD~15~1234A3 
BBC~ITEMS3~USD~10~1234A4 
NBC~ITEMS3~ILS~04~1234A5 
KBC~ITEMS3~ILS~09~1234A6 
PBC~ITEMS6~USD~09~1234B1 
VBC~ITEMS6~USD~09~1234B2 
CBC~ITEMS6~EUR~09~1234B3 
TBC~ITEMS6~EUR~09~1234B4

 

Now I want the output to be counted based on variable C and keeping in mind from the common B.

 

The output I want is :

 

B                 C                 E               seq_num

IITEMS3      USD      1234A1             1
ITEMS3       USD      1234A2             1
ITEMS3       USD      1234A3             1
ITEMS3       USD      1234A4             1
ITEMS3       ILS         1234A5             2
ITEMS3       ILS          1234A6            2
ITEMS6       USD        1234B1           1
ITEMS6       USD         1234B2          1
ITEMS6       EUR        1234B3            2
ITEMS6       EUR        1234B4           2

 

The seq_num as you can see is based on the count of variable C, I want it to be same if its repeating and change only ( by 1) only when a new value comes in the variable C. Note the father variable being B. 

 

I am confused with how to use the FIRST. LAST. to get this output.

 

Please could you help me with this.

 

P.S. I need it urgently..

 

Thank you


Accepted Solutions
Solution
‎12-11-2017 02:24 PM
Super User
Posts: 2,078

Re: Count based on group variables

[ Edited ]
Posted in reply to sayanapex06

Don't panic. Is this homework lol ? Reeza is helping you learn on your own by self learning and researching, which is only possible when you try. Anyways:

 

DATA have ;
LENGTH A $3. B E $6. C $3. ;
FORMAT D 12.4 ;
INFILE DATALINES DLM='~' ;
INPUT A B C D E ;
DATALINES ;
ABC~ITEMS3~USD~10~1234A1
DBC~ITEMS3~USD~20~1234A2
VBC~ITEMS3~USD~15~1234A3
BBC~ITEMS3~USD~10~1234A4
NBC~ITEMS3~ILS~04~1234A5
KBC~ITEMS3~ILS~09~1234A6
PBC~ITEMS6~USD~09~1234B1
VBC~ITEMS6~USD~09~1234B2
CBC~ITEMS6~EUR~09~1234B3
TBC~ITEMS6~EUR~09~1234B4
;

proc sort data=have ;
by b c;
run;

 

data want;
set have;
by b c;
if first.b then call missing(seq_num);
if first.c then seq_num+1;
run;

 

View solution in original post


All Replies
Contributor
Posts: 54

Re: Count based on group variables

Posted in reply to sayanapex06

@data_null__   need your help on this one...

Super User
Posts: 24,027

Re: Count based on group variables

Posted in reply to sayanapex06

https://stats.idre.ucla.edu/sas/faq/how-can-i-create-an-enumeration-variable-by-groups/

 

This tutorial will walk you through how to create your groups. If you have difficulties, please post the code you've attempted.

Contributor
Posts: 54

Re: Count based on group variables

I tried but couldnt get the result.

 

Below is my code :

 

DATA TEST ;
LENGTH A $3. B E $6. C $3. ;
FORMAT D 12.4 ;
INFILE DATALINES DLM='~' ;
INPUT A B C D E ;
DATALINES ;
ABC~ITEMS3~USD~10~1234A1
DBC~ITEMS3~USD~20~1234A2
VBC~ITEMS3~USD~15~1234A3
BBC~ITEMS3~USD~10~1234A4
NBC~ITEMS3~ILS~04~1234A5
KBC~ITEMS3~ILS~09~1234A6
PBC~ITEMS6~USD~09~1234B1
VBC~ITEMS6~USD~09~1234B2
CBC~ITEMS6~EUR~09~1234B3
TBC~ITEMS6~EUR~09~1234B4
;

 

PROC SORT DATA = TEST ; BY B C E ;
DATA XXX ;
SEQ = 0 ;
SET TEST ;
BY B C ;

IF FIRST.B
THEN DO ;
IF FIRST.C THEN SEQ = 1 ;
END;
IF LAST.C THEN
SEQ = SEQ + 1 ;
RUN ;

PROC PRINT DATA = XXX ;
VAR C B SEQ ;

 

Output :

 


Obs C        B        SEQ

1 ILS        ITEMS3 1
2 ILS        ITEMS3 1
3 USD      ITEMS3 0
4 USD      ITEMS3 0
5 USD       ITEMS3 0
6 USD      ITEMS3 1
7 EUR      ITEMS6 1
8 EUR      ITEMS6 1
9 USD      ITEMS6 0
10 USD     ITEMS6 1

 

whereas I want it to be :

 

Obs C        B        SEQ 

1 ILS        ITEMS3 1 
2 ILS        ITEMS3 1 
3 USD      ITEMS3 2 
4 USD      ITEMS3 2 
5 USD       ITEMS3 2 
6 USD      ITEMS3 2 
7 EUR      ITEMS6 1 
8 EUR      ITEMS6 1 
9 USD      ITEMS6 2 
10 USD     ITEMS6 2

 

Please help me, its urgent

 

 

Solution
‎12-11-2017 02:24 PM
Super User
Posts: 2,078

Re: Count based on group variables

[ Edited ]
Posted in reply to sayanapex06

Don't panic. Is this homework lol ? Reeza is helping you learn on your own by self learning and researching, which is only possible when you try. Anyways:

 

DATA have ;
LENGTH A $3. B E $6. C $3. ;
FORMAT D 12.4 ;
INFILE DATALINES DLM='~' ;
INPUT A B C D E ;
DATALINES ;
ABC~ITEMS3~USD~10~1234A1
DBC~ITEMS3~USD~20~1234A2
VBC~ITEMS3~USD~15~1234A3
BBC~ITEMS3~USD~10~1234A4
NBC~ITEMS3~ILS~04~1234A5
KBC~ITEMS3~ILS~09~1234A6
PBC~ITEMS6~USD~09~1234B1
VBC~ITEMS6~USD~09~1234B2
CBC~ITEMS6~EUR~09~1234B3
TBC~ITEMS6~EUR~09~1234B4
;

proc sort data=have ;
by b c;
run;

 

data want;
set have;
by b c;
if first.b then call missing(seq_num);
if first.c then seq_num+1;
run;

 

Contributor
Posts: 54

Re: Count based on group variables

Posted in reply to novinosrin

Thanks a ton.. But could you help me understand what is the :

call missing function doing.

 

Super User
Posts: 2,078

Re: Count based on group variables

Posted in reply to sayanapex06

Call missing here assigns a missing value in order to override the self retain of the sum statement not to increment when by groups change. 

Contributor
Posts: 54

Re: Count based on group variables

Posted in reply to novinosrin
Thanks

Super User
Posts: 6,939

Re: Count based on group variables

Posted in reply to sayanapex06

You're fiddling with the right tools.  Here's a way that should work:

 

DATA XXX ;
SET TEST ;
BY B C ;

IF FIRST.B THEN SEQ = 1 ;
ELSE IF FIRST.C THEN SEQ + 1 ;
RUN ;

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 8 replies
  • 181 views
  • 0 likes
  • 4 in conversation