BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
thummala
Obsidian | Level 7

Hi,

The following is my data set and I am trying to get the output as shown below using ONLY data step.

data email;

input custid $ date MMDDYY8. response $;

format date MMDDYY10.;

datalines;

A 12-22-10 Y

B 09-18-07 N

C 08-11-11 Y

A 03-19-03 Y

B 09-01-01 N

C 08-03-12 N

D 03-01-04 Y

A 11-08-08 Y

B 03-09-09 Y

C 09-06-09 Y

E 09-08-09 Y

A 09-09-09 Y

C 09-06-09 Y

B 09-01-06 N

E 09-03-12 Y

D 09-23-11 Y

A 01-01-00 Y

C 09-08-02 Y

D 09-07-00 Y

F 09-09-09 Y

B 09-01-00 N

C 01-01-00 N

;

run;

The out put should be like:

Custid    count

A            5

B            5

C            6

D            3

E            2

F            1

1 ACCEPTED SOLUTION

Accepted Solutions
esjackso
Quartz | Level 8

Try this:

data cnt2;

  keep custid count;

  set email;

  select (custid);

  when ("A") A + 1;

  when ("B") B + 1;

  when ("C") C + 1;

  when ("D") D + 1;

  when ("E") E + 1;

  when ("F") F + 1;

  otherwise;

  end;

  if _n_ = 22 then do;

  custid = "A";

  count = A;

  output;

  custid = "B";

  count = B;

  output;

  custid = "C";

  count = C;

  output;

  custid = "D";

  count = D;

  output;

  custid = "E";

  count = E;

  output;

  custid = "F";

  count = F;

  output;

  end;

run;

View solution in original post

9 REPLIES 9
VD
Calcite | Level 5 VD
Calcite | Level 5

Easiest is to use PROC FREQ

proc freq data=email ;

table custid/NOPERCENT NOCUM out=data1;

run;

allurai0412
Fluorite | Level 6

hi ,

just provdiding and idea .,..not code

you can use count (response ) ...and followed by group ,,....in  Proc SQL;

regards

ALLU

esjackso
Quartz | Level 8

Not sure why you need just a data step, but below requires the data to be sorted before the data step:

proc sort data=email; by custid; run;

data cnt;

  drop date response;

  set email;

  by custid;

  if first.custid then count = 1;

  else count + 1;

  if last.custid then output;

run;

thummala
Obsidian | Level 7

Hi ESJACKSO

I am a new user to SAS and my instructor asked me to solve it using data steps only. Looking for some hints to solve it.

Thank You.

esjackso
Quartz | Level 8

Try this:

data cnt2;

  keep custid count;

  set email;

  select (custid);

  when ("A") A + 1;

  when ("B") B + 1;

  when ("C") C + 1;

  when ("D") D + 1;

  when ("E") E + 1;

  when ("F") F + 1;

  otherwise;

  end;

  if _n_ = 22 then do;

  custid = "A";

  count = A;

  output;

  custid = "B";

  count = B;

  output;

  custid = "C";

  count = C;

  output;

  custid = "D";

  count = D;

  output;

  custid = "E";

  count = E;

  output;

  custid = "F";

  count = F;

  output;

  end;

run;

esjackso
Quartz | Level 8

Now the above code isnt very flexible but does provide the output using only a datastep. In fact the outputs could be coded using macro code to shorten the code a little. But for this to work you have to know all the possible custid values going in .... the other solutions posted (freq, sql, and using a sort with a by statement) all offer better solutions for a counting situation.

hope this helps!

EJ

AncaTilea
Pyrite | Level 9

Very nice!

Thank you.

I never knew of select...when...otherwise...

!!!!

Excited!

Thanks!

Anca.

AncaTilea
Pyrite | Level 9

Here is a piece of code that (almost) accomplishes what you want (I can't understand why you are not allowed to use proc sort?)

The code is based on the data you provided, and it is very 'manual' (or lots of hard-coded values) (so, again, there are quicker and better ways to accomplish this):

data want;

    set email;

    

*create a flag for each ID occurrence;

    tempA = (custId = "A");

    tempB = (custId = "B");

    tempC = (custId = "C");

    tempD = (custId = "D");

    tempE = (custId = "E");

    tempF = (custId = "F");

*the retain will help 'retain' a value until a new 'instance/value' is met;

    retain total_A 0 total_B 0 total_C 0 total_D 0 total_E 0 total_F 0;

  *sum down the rows;

    total_A=sum(total_A,tempA);

    total_B=sum(total_B,tempB);

    total_C=sum(total_C,tempC);

    total_D=sum(total_D,tempD);

    total_E=sum(total_E,tempE);

    total_F=sum(total_F,tempF);

    drop temp:;  

by_var= 1;*to use in the next data step;

run;

data may_want(Keep = total:);

    set want;

    by by_var;

    if last.temp;

run;

So the final data is not really in the same 'shape' as what you really want.

But these are my thoughts.

Good luck!

Anca.

data_null__
Jade | Level 19

The HASH object allows you to create a dynamic array indexed on CUSTID and accumulate the counts.

data email;
   infile datalines eof=eof;
   input custid $ date MMDDYY8. response $;
   length count 8;
  
if _n_ eq 1 then do;
     
declare hash h(ordered:'Y');
      h.definekey('custid');
      h.definedata('custid','count');
      h.definedone();
     
end;
  
call missing(count);
   rc=h.find();
   count +
1;
   rc=h.replace();
  
return;
eof:
   h.output(dataset:
'count');
   stop;
  
format date MMDDYY10.;
  
drop count rc;
   datalines;
A 12-22-10 Y
B 09-18-07 N
C 08-11-11 Y
A 03-19-03 Y
B 09-01-01 N
C 08-03-12 N
D 03-01-04 Y
A 11-08-08 Y
B 03-09-09 Y
C 09-06-09 Y
E 09-08-09 Y
A 09-09-09 Y
C 09-06-09 Y
B 09-01-06 N
E 09-03-12 Y
D 09-23-11 Y
A 01-01-00 Y
C 09-08-02 Y
D 09-07-00 Y
F 09-09-09 Y
B 09-01-00 N
C 01-01-00 N
;
run;

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 9 replies
  • 1948 views
  • 7 likes
  • 6 in conversation