Hello community,
I want to summarize the data based on id's present in multiple groups and see whats there count in combination of group. For example; Below are the counts in each group and based on the ids i want to check the counts in other groups as well. So if one id is present in one group, it can be in other group as well. I am trying achieving with case statements but is there any procedure and or data step i can use to get the results. I hope i made myself clear.
Group | id Counts |
abc | 10603 |
def | 84780 |
ghi | 249401 |
jkl | 184111 |
mno | 77921 |
pqr | 843570 |
Play around with where conditions. You may need to add an intermediate step that gets you only those ID's present in the wanted groups, and then do an inner join with those.
@Sunboyss30 wrote:
Hello community,
I want to summarize the data based on id's present in multiple groups and see whats there count in combination of group. For example; Below are the counts in each group and based on the ids i want to check the counts in other groups as well. So if one id is present in one group, it can be in other group as well. I am trying achieving with case statements but is there any procedure and or data step i can use to get the results. I hope i made myself clear.
Group id Counts abc 10603 def 84780 ghi 249401 jkl 184111 mno 77921 pqr 843570
Your data does not seem to match your description. In particular, there seems no matches among the groups. "So if one id is present in one group, it can be in other group as well."
Also, what do you mean by "summarize"?
Could you please provide a data set is consistent with the description of the analysis you want to do, and then show us the output from that analysis?
Do you mean this:
data have;
input Group $ Account :$15. sum_var id $;
datalines;
ABC 10792402 53475319.06 1234
DEF 3928847912 4510128.39 1234
DEF 4682433039 89231128.39 1234
GHI 1737576722 23902674.65 1234
ABC 43169608 53475319.06 5678
JKL 15715391648 54510128.39 5678
;
proc sql;
create table sums as
select
id,
sum(sum_var) as total,
count(distinct group) as count
from have
group by id;
create table want as
select
have.*,
sums.total,
sums.count
from have left join sums
on have.id = sums.id
order by id, group;
quit;
?
Since you don't need the account, that makes the SQL easier, and you only need a data step to split the result:
data have;
input Group $ Account :$15. sum_var id $;
datalines;
ABC 10792402 53475319.06 1234
DEF 3928847912 4510128.39 1234
DEF 4682433039 89231128.39 1234
GHI 1737576722 23902674.65 1234
ABC 43169608 53475319.06 5678
JKL 15715391648 54510128.39 5678
;
proc sql;
create table want as
select
group,
id,
count(distinct group) as count_of_group,
count(*) as count_of_account,
sum(sum_var) as total
from have
group by id
order by id, group;
quit;
data
want_2
want_more_than_2
;
set want;
if count_of_group le 2
then output want_2;
else output want_more_than_2;
run;
Mind that you won't need the split anyway, as you can always use a where condition in further processing.
Play around with where conditions. You may need to add an intermediate step that gets you only those ID's present in the wanted groups, and then do an inner join with those.
So explicitly test for ABC.
create table want as
select id
, group
, max( group='ABC' ) as has_ABC
, count(distinct group) as count_of_group
, count(*) as count_of_account
, sum(sum_var) as total
from have
group by id
order by id, group
;
quit;
Now it looks like you want these two groups:
where has_abc and count_of_group=2 ;
where has_abc and count_of_group>2;
One question to answer is to you need the counts to be shown in the multiple groups in a single procedure step or done one grouping rule at a time.
One method might be the use multilabel formats. Only a few procedures can use them for the different levels such as Proc Means/ Summary, Tabulate and Report (not proc sql). And the order of the creation in the format and options in the procedure. It is a moderately complex issue but I have attached some code with example data you could play with to see if the results for the demonstrated date represent some of what you want/ need.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.