ID var1
1 x
1 y
1 z
1 x
2 z
2 x
3 h
3 h
3 a
Ideally, I want it to be like either
ID var1 var1_count count_of_x
1 x 4 2
1 y 4 2
1 z 4 2
1 x 4 2
2 z 2 1
2 x 2 1
3 h 3 1
3 x 3 1
3 a 3 1
or this form will also work
ID var1_count count_of_x
1 4 2
2 2 1
3 3 1
Try this:
/* Create test data */
data have;
input ID var1 $;
cards;
1 x
1 y
1 z
1 x
2 z
2 x
3 h
3 x
3 a
;
/* Long form */
data want;
do until(last.id);
set have;
by id;
var1_count=sum(var1_count, 1);
count_of_x=sum(count_of_x, var1='x');
end;
do until(last.id);
set have;
by id;
output;
end;
run;
proc print data=want noobs;
run;
/* Aggregated form */
data want2;
do until(last.id);
set have;
by id;
var1_count=sum(var1_count, 1);
count_of_x=sum(count_of_x, var1='x');
end;
drop var1;
run;
proc print data=want2 noobs;
run;
your count_of_x doesn't work. it's showing me
id var1 count of x
1 x 1
1 y 2
2 z 3
2 x 4
Hi @kksss,
If you're talking about my suggested solution, I cannot replicate your finding. My data step for the "long form" results in:
var1_ count_ ID var1 count of_x 1 x 2 1 1 y 2 1 2 z 2 1 2 x 2 1
as it should.
I don't know what i did wrong but it's simply just not working like you pasted
the var count is working fine, but the count of x is not.
it's not scaning through x, but indexing the rows by id.
I have attached a screenshot of my out put.
here is my code regarding to those output
data test3;
do until(last.student_id);
set test1;
by student_id;
major_count=sum(major_count, 1);
CSBS_count=sum(CSBS_count, Major_College="Social & Beh Science");
end;
do until(last.student_id);
set test1;
by student_id;
output;
end;
run;
the proc sql works fine however, so i guess I can live with that.
I've tried to replicate your issue using the below test data:
data test1;
input student_id major_college $50.;
cards;
1 Social & Beh Science
1 Nondegree Seeking
1 Health
2 Social & Beh Science
2 Social & Beh Science
;
The result of your data step is:
student_ major_ CSBS_ id major_college count count 1 Social & Beh Science 3 1 1 Nondegree Seeking 3 1 1 Health 3 1 2 Social & Beh Science 2 2 2 Social & Beh Science 2 2
which seems to be correct. Do you also obtain the above result with my test data?
If so, we should take a closer look at your dataset TEST1.
well... then throw me how it can be done with proc sql, and I'll see if I like it...
My reason is because, I have to pass down my code to colleagues who doesn't use sql much... so it's easier for them to understand what is going on.
@LinusH: I remember the "dark ages" when I would have preferred a non-SQL solution, too, as I would have felt less confident to adapt and maintain such "advanced" code.
For the aggregated output dataset, of course, that's fine (I would have shortened the definition of count_X to sum(var1='x'), though). However, to obtain the result in "long form" with PROC SQL alone, would require additional measures to ensure that the order of the observations within one ID is preserved (if this was important).
@LinusH: Fully agreed.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.