DATA Step, Macro, Functions and more

New user - need help

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 16
Accepted Solution

New user - need help

Hi,

The following is my data set and I am trying to get the output as shown below using ONLY data step.

data email;

input custid $ date MMDDYY8. response $;

format date MMDDYY10.;

datalines;

A 12-22-10 Y

B 09-18-07 N

C 08-11-11 Y

A 03-19-03 Y

B 09-01-01 N

C 08-03-12 N

D 03-01-04 Y

A 11-08-08 Y

B 03-09-09 Y

C 09-06-09 Y

E 09-08-09 Y

A 09-09-09 Y

C 09-06-09 Y

B 09-01-06 N

E 09-03-12 Y

D 09-23-11 Y

A 01-01-00 Y

C 09-08-02 Y

D 09-07-00 Y

F 09-09-09 Y

B 09-01-00 N

C 01-01-00 N

;

run;

The out put should be like:

Custid    count

A            5

B            5

C            6

D            3

E            2

F            1


Accepted Solutions
Solution
‎02-12-2013 07:41 AM
Super Contributor
Posts: 333

Re: New user - need help

Try this:

data cnt2;

  keep custid count;

  set email;

  select (custid);

  when ("A") A + 1;

  when ("B") B + 1;

  when ("C") C + 1;

  when ("D") D + 1;

  when ("E") E + 1;

  when ("F") F + 1;

  otherwise;

  end;

  if _n_ = 22 then do;

  custid = "A";

  count = A;

  output;

  custid = "B";

  count = B;

  output;

  custid = "C";

  count = C;

  output;

  custid = "D";

  count = D;

  output;

  custid = "E";

  count = E;

  output;

  custid = "F";

  count = F;

  output;

  end;

run;

View solution in original post


All Replies
Frequent Contributor
Frequent Contributor
Posts: 94

Re: New user - need help

Easiest is to use PROC FREQ

proc freq data=email ;

table custid/NOPERCENT NOCUM out=data1;

run;

Frequent Contributor
Posts: 97

Re: New user - need help

hi ,

just provdiding and idea .,..not code

you can use count (response ) ...and followed by group ,,....in  Proc SQL;

regards

ALLU

Super Contributor
Posts: 333

Re: New user - need help

Not sure why you need just a data step, but below requires the data to be sorted before the data step:

proc sort data=email; by custid; run;

data cnt;

  drop date response;

  set email;

  by custid;

  if first.custid then count = 1;

  else count + 1;

  if last.custid then output;

run;

Occasional Contributor
Posts: 16

Re: New user - need help

Hi ESJACKSO

I am a new user to SAS and my instructor asked me to solve it using data steps only. Looking for some hints to solve it.

Thank You.

Solution
‎02-12-2013 07:41 AM
Super Contributor
Posts: 333

Re: New user - need help

Try this:

data cnt2;

  keep custid count;

  set email;

  select (custid);

  when ("A") A + 1;

  when ("B") B + 1;

  when ("C") C + 1;

  when ("D") D + 1;

  when ("E") E + 1;

  when ("F") F + 1;

  otherwise;

  end;

  if _n_ = 22 then do;

  custid = "A";

  count = A;

  output;

  custid = "B";

  count = B;

  output;

  custid = "C";

  count = C;

  output;

  custid = "D";

  count = D;

  output;

  custid = "E";

  count = E;

  output;

  custid = "F";

  count = F;

  output;

  end;

run;

Super Contributor
Posts: 333

Re: New user - need help

Now the above code isnt very flexible but does provide the output using only a datastep. In fact the outputs could be coded using macro code to shorten the code a little. But for this to work you have to know all the possible custid values going in .... the other solutions posted (freq, sql, and using a sort with a by statement) all offer better solutions for a counting situation.

hope this helps!

EJ

Super Contributor
Posts: 543

Re: New user - need help

Very nice!

Thank you.

I never knew of select...when...otherwise...

!!!!

Excited!

Thanks!

Anca.

Super Contributor
Posts: 543

Re: New user - need help

Here is a piece of code that (almost) accomplishes what you want (I can't understand why you are not allowed to use proc sort?)

The code is based on the data you provided, and it is very 'manual' (or lots of hard-coded values) (so, again, there are quicker and better ways to accomplish this):

data want;

    set email;

    

*create a flag for each ID occurrence;

    tempA = (custId = "A");

    tempB = (custId = "B");

    tempC = (custId = "C");

    tempD = (custId = "D");

    tempE = (custId = "E");

    tempF = (custId = "F");

*the retain will help 'retain' a value until a new 'instance/value' is met;

    retain total_A 0 total_B 0 total_C 0 total_D 0 total_E 0 total_F 0;

  *sum down the rows;

    total_A=sum(total_A,tempA);

    total_B=sum(total_B,tempB);

    total_C=sum(total_C,tempC);

    total_D=sum(total_D,tempD);

    total_E=sum(total_E,tempE);

    total_F=sum(total_F,tempF);

    drop temp:;  

by_var= 1;*to use in the next data step;

run;

data may_want(Keep = totalSmiley Happy;

    set want;

    by by_var;

    if last.temp;

run;

So the final data is not really in the same 'shape' as what you really want.

But these are my thoughts.

Good luck!

Anca.

Respected Advisor
Posts: 3,777

Re: New user - need help

The HASH object allows you to create a dynamic array indexed on CUSTID and accumulate the counts.

data email;
   infile datalines eof=eof;
   input custid $ date MMDDYY8. response $;
   length count 8;
  
if _n_ eq 1 then do;
     
declare hash h(ordered:'Y');
      h.definekey('custid');
      h.definedata('custid','count');
      h.definedone();
     
end;
  
call missing(count);
   rc=h.find();
   count +
1;
   rc=h.replace();
  
return;
eof:
   h.output(dataset:
'count');
   stop;
  
format date MMDDYY10.;
  
drop count rc;
   datalines;
A 12-22-10 Y
B 09-18-07 N
C 08-11-11 Y
A 03-19-03 Y
B 09-01-01 N
C 08-03-12 N
D 03-01-04 Y
A 11-08-08 Y
B 03-09-09 Y
C 09-06-09 Y
E 09-08-09 Y
A 09-09-09 Y
C 09-06-09 Y
B 09-01-06 N
E 09-03-12 Y
D 09-23-11 Y
A 01-01-00 Y
C 09-08-02 Y
D 09-07-00 Y
F 09-09-09 Y
B 09-01-00 N
C 01-01-00 N
;
run;
☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 9 replies
  • 344 views
  • 7 likes
  • 6 in conversation