Solved: Re: Summarize data based on different group combination

Sunboyss30 · Posted 03-26-2020 07:53 AM

Hello community,

I want to summarize the data based on id's present in multiple groups and see whats there count in combination of group. For example; Below are the counts in each group and based on the ids i want to check the counts in other groups as well. So if one id is present in one group, it can be in other group as well. I am trying achieving with case statements but is there any procedure and or data step i can use to get the results. I hope i made myself clear.

Group	id Counts
abc	10603
def	84780
ghi	249401
jkl	184111
mno	77921
pqr	843570

Kurt_Bremser · Posted 03-27-2020 11:58 AM

Play around with where conditions. You may need to add an intermediate step that gets you only those ID's present in the wanted groups, and then do an inner join with those.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

View solution in original post

PaigeMiller · Posted 03-26-2020 08:12 AM

@Sunboyss30 wrote:

Hello community,

I want to summarize the data based on id's present in multiple groups and see whats there count in combination of group. For example; Below are the counts in each group and based on the ids i want to check the counts in other groups as well. So if one id is present in one group, it can be in other group as well. I am trying achieving with case statements but is there any procedure and or data step i can use to get the results. I hope i made myself clear.

Group id Counts

abc 10603

def 84780

ghi 249401

jkl 184111

mno 77921

pqr 843570

Your data does not seem to match your description. In particular, there seems no matches among the groups. "So if one id is present in one group, it can be in other group as well."

Also, what do you mean by "summarize"?

Could you please provide a data set is consistent with the description of the analysis you want to do, and then show us the output from that analysis?

--
Paige Miller

Sunboyss30 · Posted 03-26-2020 08:58 AM

Hi Paige,

Thanks for your reply. The data set is big. Please find below an example of the same.
There are 9 such groups in the actual data and in the example below i have taken only 5.
So, if you see id 1234 is present in three different groups.
I want to count in how many other group that id is other then ABC with that group name and also sum the
field sum_Var?

Group Account sum_var id
ABC 10792402 53475319.06 1234
DEF 3928847912 4510128.39 1234
DEF 4682433039 89231128.39 1234
GHI 1737576722 23902674.65 1234
ABC 43169608 53475319.06 5678
JKL 15715391648 54510128.39 5678

Tom · Posted 03-26-2020 09:09 AM

It is not clear what you want. Can you show what the result would be for those 6 observations?

Sunboyss30 · Posted 03-27-2020 04:26 AM

Hi Tom,
Thanks for your reply. Really appreciate.
I need to count the number of ids and sum the values based on a combination of group. So scenario 1, let say group value ABC with only 1 other group value (any other than ABC) and count the ids and sum the variable. such cases.
scenario 2 - Then group value ABC with more than 1 other group value (any other than ABC) and count the ids and sum the variable.
So the output should be like ;
scenario 1
Group id count of group count of account sum_var
ABC 5678 2 2 107985447.5
JKL 5678 2 2 107985447.5

scenario 2
Group id count of group count of account sum_var
ABC 1234 3 4 171119250.5
DEF 1234 3 4 171119250.5
DEF 1234 3 4 171119250.5
GHI 1234 3 4 171119250.5

Thanks in advance

Kurt_Bremser · Posted 03-26-2020 10:24 AM

Do you mean this:

data have;
input Group $ Account :$15. sum_var id $;
datalines;
ABC 10792402 53475319.06 1234
DEF 3928847912 4510128.39 1234
DEF 4682433039 89231128.39 1234
GHI 1737576722 23902674.65 1234
ABC 43169608 53475319.06 5678
JKL 15715391648 54510128.39 5678
;

proc sql;
create table sums as
select
  id,
  sum(sum_var) as total,
  count(distinct group) as count
from have
group by id;
create table want as
select
  have.*,
  sums.total,
  sums.count
from have left join sums
on have.id = sums.id
order by id, group;
quit;

?

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Sunboyss30 · Posted 03-26-2020 03:29 PM

Hi Kurt,
Thanks for your reply. Really appreciate.

Actually you are correct. May be I was not clear in earlier.
I need to count the number of ids and sum the values based on a combination of group. So scenario 1, let say group value ABC with only 1 other group value (any other than ABC) and count the ids and sum the variable. such cases.
scenario 2 - Then group value ABC with more than 1 other group value (any other than ABC) and count the ids and sum the variable.
So the output should be like ;
scenario 1
Group id count of group count of account sum_var
ABC 5678 2 2 107985447.5
JKL 5678 2 2 107985447.5

scenario 2
Group id count of group count of account sum_var
ABC 1234 3 4 171119250.5
DEF 1234 3 4 171119250.5
DEF 1234 3 4 171119250.5
GHI 1234 3 4 171119250.5

Thanks in advance

Sunboyss30 · Posted 03-26-2020 03:41 PM

Also to add, ids that have been summarized in scenario 1 should not be counted in scenario 2.

Kurt_Bremser · Posted 03-27-2020 06:11 AM

Since you don't need the account, that makes the SQL easier, and you only need a data step to split the result:

data have;
input Group $ Account :$15. sum_var id $;
datalines;
ABC 10792402 53475319.06 1234
DEF 3928847912 4510128.39 1234
DEF 4682433039 89231128.39 1234
GHI 1737576722 23902674.65 1234
ABC 43169608 53475319.06 5678
JKL 15715391648 54510128.39 5678
;

proc sql;
create table want as
select
  group,
  id,
  count(distinct group) as count_of_group,
  count(*) as count_of_account,
  sum(sum_var) as total
from have
group by id
order by id, group;
quit;

data
  want_2
  want_more_than_2
;
set want;
if count_of_group le 2
then output want_2;
else output want_more_than_2;
run;

Mind that you won't need the split anyway, as you can always use a where condition in further processing.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Sunboyss30 · Posted 03-27-2020 08:22 AM

Thanks a lot Kurt. its perfect but i also want to filter only group le 2 (having ABC + any other group value) AND group gt 2 (having ABC + any other group value) per ID. Right now group le 2 also has combination of DEF & GHI.

Really appreciate.

Kurt_Bremser · Posted 03-27-2020 11:58 AM

Play around with where conditions. You may need to add an intermediate step that gets you only those ID's present in the wanted groups, and then do an inner join with those.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Tom · Posted 03-27-2020 12:06 PM

So explicitly test for ABC.

create table want as
select id
     , group
     , max( group='ABC' ) as has_ABC
     , count(distinct group) as count_of_group
     , count(*) as count_of_account
     , sum(sum_var) as total
from have
group by id
order by id, group
;
quit;

Now it looks like you want these two groups:

where has_abc and count_of_group=2 ;
where has_abc and count_of_group>2;

Sunboyss30 · Posted 03-27-2020 04:55 AM

Hi Paige,
Thanks for your reply. Really appreciate.
I need to count the number of ids and sum the values based on a combination of group. So scenario 1, let say group value ABC with only 1 other group value (any other than ABC) and count the ids and sum the variable. such cases.
scenario 2 - Then group value ABC with more than 1 other group value (any other than ABC) and count the ids and sum the variable.
So the output should be like ;
scenario 1
Group id count of group count of account sum_var
ABC 5678 2 2 107985447.5
JKL 5678 2 2 107985447.5

scenario 2
Group id count of group count of account sum_var
ABC 1234 3 4 171119250.5
DEF 1234 3 4 171119250.5
DEF 1234 3 4 171119250.5
GHI 1234 3 4 171119250.5
Also, ids that have been summarized in scenario 1 should not be counted in scenario 2.
Thanks in advance

ballardw · Posted 03-26-2020 12:44 PM

One question to answer is to you need the counts to be shown in the multiple groups in a single procedure step or done one grouping rule at a time.

One method might be the use multilabel formats. Only a few procedures can use them for the different levels such as Proc Means/ Summary, Tabulate and Report (not proc sql). And the order of the creation in the format and options in the procedure. It is a moderately complex issue but I have attached some code with example data you could play with to see if the results for the demonstrated date represent some of what you want/ need.

Catch up on SAS Innovate 2026