Obsidian | Level 7

## Summarize data based on different group combination

Hello community,

I want to summarize the data based on id's present in multiple groups and see whats there count in combination of group. For example; Below are the counts in each group and based on the ids i want to check the counts in other groups as well. So if one id is present in one group, it can be in other group as well. I am trying achieving with case statements but is there any procedure and or data step i can use to get the results. I hope i made myself clear.

 Group id Counts abc 10603 def 84780 ghi 249401 jkl 184111 mno 77921 pqr 843570
1 ACCEPTED SOLUTION

Accepted Solutions
Super User

## Re: Summarize data based on different group combination

Play around with where conditions. You may need to add an intermediate step that gets you only those ID's present in the wanted groups, and then do an inner join with those.

13 REPLIES 13
Diamond | Level 26

## Re: Summarize data based on different group combination

@Sunboyss30 wrote:

Hello community,

I want to summarize the data based on id's present in multiple groups and see whats there count in combination of group. For example; Below are the counts in each group and based on the ids i want to check the counts in other groups as well. So if one id is present in one group, it can be in other group as well. I am trying achieving with case statements but is there any procedure and or data step i can use to get the results. I hope i made myself clear.

 Group id Counts abc 10603 def 84780 ghi 249401 jkl 184111 mno 77921 pqr 843570

Your data does not seem to match your description. In particular, there seems no matches among the groups. "So if one id is present in one group, it can be in other group as well."

Also, what do you mean by "summarize"?

Could you please provide a data set is consistent with the description of the analysis you want to do, and then show us the output from that analysis?

--
Paige Miller
Obsidian | Level 7

## Re: Summarize data based on different group combination

Hi Paige,

Thanks for your reply. The data set is big. Please find below an example of the same.
There are 9 such groups in the actual data and in the example below i have taken only 5.
So, if you see id 1234 is present in three different groups.
I want to count in how many other group that id is other then ABC with that group name and also sum the
field sum_Var?

Group Account sum_var id
ABC 10792402 53475319.06 1234
DEF 3928847912 4510128.39 1234
DEF 4682433039 89231128.39 1234
GHI 1737576722 23902674.65 1234
ABC 43169608 53475319.06 5678
JKL 15715391648 54510128.39 5678
Super User

## Re: Summarize data based on different group combination

It is not clear what you want. Can you show what the result would be for those 6 observations?
Obsidian | Level 7

## Re: Summarize data based on different group combination

Hi Tom,
I need to count the number of ids and sum the values based on a combination of group. So scenario 1, let say group value ABC with only 1 other group value (any other than ABC) and count the ids and sum the variable. such cases.
scenario 2 - Then group value ABC with more than 1 other group value (any other than ABC) and count the ids and sum the variable.
So the output should be like ;
scenario 1
Group id count of group count of account sum_var
ABC 5678 2 2 107985447.5
JKL 5678 2 2 107985447.5

scenario 2
Group id count of group count of account sum_var
ABC 1234 3 4 171119250.5
DEF 1234 3 4 171119250.5
DEF 1234 3 4 171119250.5
GHI 1234 3 4 171119250.5

Super User

## Re: Summarize data based on different group combination

Do you mean this:

``````data have;
input Group \$ Account :\$15. sum_var id \$;
datalines;
ABC 10792402 53475319.06 1234
DEF 3928847912 4510128.39 1234
DEF 4682433039 89231128.39 1234
GHI 1737576722 23902674.65 1234
ABC 43169608 53475319.06 5678
JKL 15715391648 54510128.39 5678
;

proc sql;
create table sums as
select
id,
sum(sum_var) as total,
count(distinct group) as count
from have
group by id;
create table want as
select
have.*,
sums.total,
sums.count
from have left join sums
on have.id = sums.id
order by id, group;
quit;``````

?

Obsidian | Level 7

## Re: Summarize data based on different group combination

Hi Kurt,

Actually you are correct. May be I was not clear in earlier.
I need to count the number of ids and sum the values based on a combination of group. So scenario 1, let say group value ABC with only 1 other group value (any other than ABC) and count the ids and sum the variable. such cases.
scenario 2 - Then group value ABC with more than 1 other group value (any other than ABC) and count the ids and sum the variable.
So the output should be like ;
scenario 1
Group id count of group count of account sum_var
ABC 5678 2 2 107985447.5
JKL 5678 2 2 107985447.5

scenario 2
Group id count of group count of account sum_var
ABC 1234 3 4 171119250.5
DEF 1234 3 4 171119250.5
DEF 1234 3 4 171119250.5
GHI 1234 3 4 171119250.5

Obsidian | Level 7

## Re: Summarize data based on different group combination

Also to add, ids that have been summarized in scenario 1 should not be counted in scenario 2.
Super User

## Re: Summarize data based on different group combination

Since you don't need the account, that makes the SQL easier, and you only need a data step to split the result:

``````data have;
input Group \$ Account :\$15. sum_var id \$;
datalines;
ABC 10792402 53475319.06 1234
DEF 3928847912 4510128.39 1234
DEF 4682433039 89231128.39 1234
GHI 1737576722 23902674.65 1234
ABC 43169608 53475319.06 5678
JKL 15715391648 54510128.39 5678
;

proc sql;
create table want as
select
group,
id,
count(distinct group) as count_of_group,
count(*) as count_of_account,
sum(sum_var) as total
from have
group by id
order by id, group;
quit;

data
want_2
want_more_than_2
;
set want;
if count_of_group le 2
then output want_2;
else output want_more_than_2;
run;``````

Mind that you won't need the split anyway, as you can always use a where condition in further processing.

Obsidian | Level 7

## Re: Summarize data based on different group combination

Thanks a lot Kurt. its perfect but i also want to filter only group le 2 (having ABC + any other group value) AND group gt 2 (having ABC + any other group value) per ID. Right now group le 2 also has combination of DEF & GHI.

Really appreciate.
Super User

## Re: Summarize data based on different group combination

Play around with where conditions. You may need to add an intermediate step that gets you only those ID's present in the wanted groups, and then do an inner join with those.

Super User

## Re: Summarize data based on different group combination

So explicitly test for ABC.

``````create table want as
select id
, group
, max( group='ABC' ) as has_ABC
, count(distinct group) as count_of_group
, count(*) as count_of_account
, sum(sum_var) as total
from have
group by id
order by id, group
;
quit;``````

Now it looks like you want these two groups:

``````where has_abc and count_of_group=2 ;
where has_abc and count_of_group>2;``````
Obsidian | Level 7

## Re: Summarize data based on different group combination

Hi Paige,
I need to count the number of ids and sum the values based on a combination of group. So scenario 1, let say group value ABC with only 1 other group value (any other than ABC) and count the ids and sum the variable. such cases.
scenario 2 - Then group value ABC with more than 1 other group value (any other than ABC) and count the ids and sum the variable.
So the output should be like ;
scenario 1
Group id count of group count of account sum_var
ABC 5678 2 2 107985447.5
JKL 5678 2 2 107985447.5

scenario 2
Group id count of group count of account sum_var
ABC 1234 3 4 171119250.5
DEF 1234 3 4 171119250.5
DEF 1234 3 4 171119250.5
GHI 1234 3 4 171119250.5
Also, ids that have been summarized in scenario 1 should not be counted in scenario 2.
Super User

## Re: Summarize data based on different group combination

One question to answer is to you need the counts to be shown in the multiple groups in a single procedure step or done one grouping rule at a time.

One method might be the use multilabel formats. Only a few procedures can use them for the different levels such as Proc Means/ Summary, Tabulate and Report (not proc sql). And the order of the creation in the format and options in the procedure. It is a moderately complex issue but I have attached some code with example data you could play with to see if the results for the demonstrated date represent some of what you want/ need.

Discussion stats
• 13 replies
• 1373 views
• 4 likes
• 5 in conversation