Hello,
I have data arranged like this and I'm trying to count the number of non-zero occurrences for each ID value. I'm trying to do this without using a PROC step and to only use a DATA step. Please forgive me if this question has been solved elsewhere - I couldn't find this in the community.
ID A B
1000 0 1
1000 0 1
1000 1 0
1111 1 0
1111 1 0
1111 1 0
1111 0 1
1111 0 1
2222 0 1
2222 0 1
2222 1 0
2222 1 0
2222 0 1
The final result set should look this:
ID A B
1000 1 2
1111 3 2
2222 2 3
Your help is greatly appreciated - thank you in advance!
~E
Method one:
proc sql;
create table want as
select distinct(ID),sum(A) as A, sum(B) as B from have group by ID;
quit;
Method two:
proc summary data=have;
class id;
var A B;
ways 1;
output out=want sum(A B)=A B;
run;
Method three:
data want;
set have;
by id;
retain sumA sumB;
if first.id then call missing(sumA,sumB);
sumA+A; sumB+B;
if last.id then output;
run;
It needs to be a data step because the results will later be combined into another data step. There are other variables that will need to be in the dataset, I was just trying to keep it simple showing only those three generic ones. I don't really want to use a PROC and unsure if it's possible to use the results of a PROC step into another DATA step.
Thanks,
E
@esvenson wrote:
It needs to be a data step because the results will later be combined into another data step.
Well, that's not the way SAS works. You can run a PROC, have the results in a SAS data set, and then that data set combined with other data in a data step.
<blockquote>There are other variables that will need to be in the dataset</blockquote>
Yep. PROCs work here too. See my paragraph above. There's nothing you have said that forces you to use DATA step only.
Ok, good to know. Do you have an idea on how to do it?
Thanks for your help,
E
You could certainly do it in a data step if it was sorted first ... but that requires PROC SORT ... I guess you can't do this without a PROC.
I have no problem using a PROC SORT first - I only meant that the final result would be unusable for what I later need as a PROC MEANS or PROC FREQ. Hope this clarifies.
E
Method one:
proc sql;
create table want as
select distinct(ID),sum(A) as A, sum(B) as B from have group by ID;
quit;
Method two:
proc summary data=have;
class id;
var A B;
ways 1;
output out=want sum(A B)=A B;
run;
Method three:
data want;
set have;
by id;
retain sumA sumB;
if first.id then call missing(sumA,sumB);
sumA+A; sumB+B;
if last.id then output;
run;
You can use RETAIN statement and FIRST. and LAST. , Something like this....
DATA WANT;
INFILE DATALINES;
INPUT ID A B;
DATALINES;
1000 0 1
1000 0 1
1000 1 0
1111 1 0
1111 1 0
1111 1 0
1111 0 1
1111 0 1
2222 0 1
2222 0 1
2222 1 0
2222 1 0
2222 0 1
;
RUN;
PROC SORT DATA=WANT;
BY ID;
RUN;
DATA WANT1(DROP=A B);
SET WANT;
RETAIN Count_A Count_B;
BY ID ;
IF FIRST.ID THEN DO;count_A=A;Count_B=B;END;
ELSE DO;Count_A=Count_A+A;Count_B=Count_B+B;END;
IF LAST.ID THEN OUTPUT;
RUN;
Just in case this opens new avenues of thinking for you, here is one way to use a PROC to create a SAS data set:
proc summary data=have nway;
class ID;
var A B;
output out=want (drop=_type_ _freq_) sum=;
run;
The WANT data set should match what you are seeking here, with one issue to consider. You asked for a count of the non-zero values. This is actually something a little bit different ... the sum of all the values.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.