Solved: Re: Counting the number of occurrences for a grouped variable

esvenson · Posted 06-15-2017 12:49 PM

Hello,

I have data arranged like this and I'm trying to count the number of non-zero occurrences for each ID value. I'm trying to do this without using a PROC step and to only use a DATA step. Please forgive me if this question has been solved elsewhere - I couldn't find this in the community.

ID A B

1000 0 1

1000 1 0

1111 1 0

1111 0 1

2222 0 1

2222 1 0

2222 0 1

The final result set should look this:

ID A B

1000 1 2

1111 3 2

2222 2 3

Your help is greatly appreciated - thank you in advance!

~E

slchen · Posted 06-15-2017 01:46 PM

Method one:
proc sql;

create table want as
select distinct(ID),sum(A) as A, sum(B) as B from have group by ID;
quit;

Method two:

proc summary data=have;
class id;
var A B;
ways 1;
output out=want sum(A B)=A B;
run;

Method three:

data want;
set have;
by id;
retain sumA sumB;
if first.id then call missing(sumA,sumB);
sumA+A; sumB+B;
if last.id then output;
run;

View solution in original post

LinusH · Posted 06-15-2017 01:19 PM

Why not a PROC if that's easier/better?

Data never sleeps

esvenson · Posted 06-15-2017 01:23 PM

It needs to be a data step because the results will later be combined into another data step. There are other variables that will need to be in the dataset, I was just trying to keep it simple showing only those three generic ones. I don't really want to use a PROC and unsure if it's possible to use the results of a PROC step into another DATA step.

Thanks,

E

PaigeMiller · Posted 06-15-2017 01:27 PM

@esvenson wrote:

It needs to be a data step because the results will later be combined into another data step.

Well, that's not the way SAS works. You can run a PROC, have the results in a SAS data set, and then that data set combined with other data in a data step.

<blockquote>There are other variables that will need to be in the dataset</blockquote>

Yep. PROCs work here too. See my paragraph above. There's nothing you have said that forces you to use DATA step only.

--
Paige Miller

esvenson · Posted 06-15-2017 01:29 PM

Ok, good to know. Do you have an idea on how to do it?

Thanks for your help,

E

PaigeMiller · Posted 06-15-2017 01:25 PM

You could certainly do it in a data step if it was sorted first ... but that requires PROC SORT ... I guess you can't do this without a PROC.

--
Paige Miller

esvenson · Posted 06-15-2017 01:26 PM

I have no problem using a PROC SORT first - I only meant that the final result would be unusable for what I later need as a PROC MEANS or PROC FREQ. Hope this clarifies.

E

slchen · Posted 06-15-2017 01:46 PM

Method one:
proc sql;

create table want as
select distinct(ID),sum(A) as A, sum(B) as B from have group by ID;
quit;

Method two:

proc summary data=have;
class id;
var A B;
ways 1;
output out=want sum(A B)=A B;
run;

Method three:

data want;
set have;
by id;
retain sumA sumB;
if first.id then call missing(sumA,sumB);
sumA+A; sumB+B;
if last.id then output;
run;

esvenson · Posted 06-15-2017 02:45 PM

Dear slchen,
Thank you so much for taking the time to provide different ways of doing this. I tried all three and they all worked great. Thanks again.

Have a great day,
~E

SuryaKiran · Posted 06-15-2017 01:36 PM

You can use RETAIN statement and FIRST. and LAST. , Something like this....

DATA WANT;
INFILE DATALINES;
INPUT ID A B;
DATALINES;
1000 0 1
1000 0 1
1000 1 0
1111 1 0
1111 1 0
1111 1 0
1111 0 1
1111 0 1
2222 0 1
2222 0 1
2222 1 0
2222 1 0
2222 0 1
;
RUN;
PROC SORT DATA=WANT;
BY ID;
RUN;

DATA WANT1(DROP=A B);
SET WANT;
RETAIN Count_A Count_B;
BY ID ;
IF FIRST.ID THEN DO;count_A=A;Count_B=B;END;
ELSE DO;Count_A=Count_A+A;Count_B=Count_B+B;END;
IF LAST.ID THEN OUTPUT;
RUN;

Thanks,
Suryakiran

Astounding · Posted 06-15-2017 01:43 PM

Just in case this opens new avenues of thinking for you, here is one way to use a PROC to create a SAS data set:

proc summary data=have nway;

class ID;

var A B;

output out=want (drop=_type_ _freq_) sum=;

run;

The WANT data set should match what you are seeking here, with one issue to consider. You asked for a count of the non-zero values. This is actually something a little bit different ... the sum of all the values.

SAS Innovate 2025: Save the Date

SAS Training: Just a Click Away