BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Sunboyss30
Obsidian | Level 7

Hello community,

 

I want to summarize the data based on id's present in multiple groups and see whats there count in combination of group. For example; Below are the counts in each group and based on the ids i want to check the counts in other groups as well. So if one id is present in one group, it can be in other group as well. I am trying achieving with case statements but is there any procedure and or data step i can use to get the results. I hope i made myself clear.

 

Groupid Counts
abc10603
def84780
ghi249401
jkl184111
mno77921
pqr843570
1 ACCEPTED SOLUTION

Accepted Solutions
13 REPLIES 13
PaigeMiller
Diamond | Level 26

@Sunboyss30 wrote:

Hello community,

 

I want to summarize the data based on id's present in multiple groups and see whats there count in combination of group. For example; Below are the counts in each group and based on the ids i want to check the counts in other groups as well. So if one id is present in one group, it can be in other group as well. I am trying achieving with case statements but is there any procedure and or data step i can use to get the results. I hope i made myself clear.

 

Group id Counts
abc 10603
def 84780
ghi 249401
jkl 184111
mno 77921
pqr 843570

Your data does not seem to match your description. In particular, there seems no matches among the groups. "So if one id is present in one group, it can be in other group as well." 

 

Also, what do you mean by "summarize"?

 

Could you please provide a data set is consistent with the description of the analysis you want to do, and then show us the output from that analysis?

--
Paige Miller
Sunboyss30
Obsidian | Level 7
Hi Paige,

Thanks for your reply. The data set is big. Please find below an example of the same.
There are 9 such groups in the actual data and in the example below i have taken only 5.
So, if you see id 1234 is present in three different groups.
I want to count in how many other group that id is other then ABC with that group name and also sum the
field sum_Var?

Group Account sum_var id
ABC 10792402 53475319.06 1234
DEF 3928847912 4510128.39 1234
DEF 4682433039 89231128.39 1234
GHI 1737576722 23902674.65 1234
ABC 43169608 53475319.06 5678
JKL 15715391648 54510128.39 5678
Tom
Super User Tom
Super User
It is not clear what you want. Can you show what the result would be for those 6 observations?
Sunboyss30
Obsidian | Level 7
Hi Tom,
Thanks for your reply. Really appreciate.
I need to count the number of ids and sum the values based on a combination of group. So scenario 1, let say group value ABC with only 1 other group value (any other than ABC) and count the ids and sum the variable. such cases.
scenario 2 - Then group value ABC with more than 1 other group value (any other than ABC) and count the ids and sum the variable.
So the output should be like ;
scenario 1
Group id count of group count of account sum_var
ABC 5678 2 2 107985447.5
JKL 5678 2 2 107985447.5

scenario 2
Group id count of group count of account sum_var
ABC 1234 3 4 171119250.5
DEF 1234 3 4 171119250.5
DEF 1234 3 4 171119250.5
GHI 1234 3 4 171119250.5

Thanks in advance
Kurt_Bremser
Super User

Do you mean this:

data have;
input Group $ Account :$15. sum_var id $;
datalines;
ABC 10792402 53475319.06 1234
DEF 3928847912 4510128.39 1234
DEF 4682433039 89231128.39 1234
GHI 1737576722 23902674.65 1234
ABC 43169608 53475319.06 5678
JKL 15715391648 54510128.39 5678
;

proc sql;
create table sums as
select
  id,
  sum(sum_var) as total,
  count(distinct group) as count
from have
group by id;
create table want as
select
  have.*,
  sums.total,
  sums.count
from have left join sums
on have.id = sums.id
order by id, group;
quit;

?

Sunboyss30
Obsidian | Level 7
Hi Kurt,
Thanks for your reply. Really appreciate.

Actually you are correct. May be I was not clear in earlier.
I need to count the number of ids and sum the values based on a combination of group. So scenario 1, let say group value ABC with only 1 other group value (any other than ABC) and count the ids and sum the variable. such cases.
scenario 2 - Then group value ABC with more than 1 other group value (any other than ABC) and count the ids and sum the variable.
So the output should be like ;
scenario 1
Group id count of group count of account sum_var
ABC 5678 2 2 107985447.5
JKL 5678 2 2 107985447.5

scenario 2
Group id count of group count of account sum_var
ABC 1234 3 4 171119250.5
DEF 1234 3 4 171119250.5
DEF 1234 3 4 171119250.5
GHI 1234 3 4 171119250.5

Thanks in advance
Sunboyss30
Obsidian | Level 7
Also to add, ids that have been summarized in scenario 1 should not be counted in scenario 2.
Kurt_Bremser
Super User

Since you don't need the account, that makes the SQL easier, and you only need a data step to split the result:

data have;
input Group $ Account :$15. sum_var id $;
datalines;
ABC 10792402 53475319.06 1234
DEF 3928847912 4510128.39 1234
DEF 4682433039 89231128.39 1234
GHI 1737576722 23902674.65 1234
ABC 43169608 53475319.06 5678
JKL 15715391648 54510128.39 5678
;

proc sql;
create table want as
select
  group,
  id,
  count(distinct group) as count_of_group,
  count(*) as count_of_account,
  sum(sum_var) as total
from have
group by id
order by id, group;
quit;

data
  want_2
  want_more_than_2
;
set want;
if count_of_group le 2
then output want_2;
else output want_more_than_2;
run;

Mind that you won't need the split anyway, as you can always use a where condition in further processing.

Sunboyss30
Obsidian | Level 7
Thanks a lot Kurt. its perfect but i also want to filter only group le 2 (having ABC + any other group value) AND group gt 2 (having ABC + any other group value) per ID. Right now group le 2 also has combination of DEF & GHI.

Really appreciate.
Kurt_Bremser
Super User

Play around with where conditions. You may need to add an intermediate step that gets you only those ID's present in the wanted groups, and then do an inner join with those.

Tom
Super User Tom
Super User

So explicitly test for ABC.

create table want as
select id
     , group
     , max( group='ABC' ) as has_ABC
     , count(distinct group) as count_of_group
     , count(*) as count_of_account
     , sum(sum_var) as total
from have
group by id
order by id, group
;
quit;

Now it looks like you want these two groups: 

where has_abc and count_of_group=2 ;
where has_abc and count_of_group>2;
Sunboyss30
Obsidian | Level 7
Hi Paige,
Thanks for your reply. Really appreciate.
I need to count the number of ids and sum the values based on a combination of group. So scenario 1, let say group value ABC with only 1 other group value (any other than ABC) and count the ids and sum the variable. such cases.
scenario 2 - Then group value ABC with more than 1 other group value (any other than ABC) and count the ids and sum the variable.
So the output should be like ;
scenario 1
Group id count of group count of account sum_var
ABC 5678 2 2 107985447.5
JKL 5678 2 2 107985447.5

scenario 2
Group id count of group count of account sum_var
ABC 1234 3 4 171119250.5
DEF 1234 3 4 171119250.5
DEF 1234 3 4 171119250.5
GHI 1234 3 4 171119250.5
Also, ids that have been summarized in scenario 1 should not be counted in scenario 2.
Thanks in advance
ballardw
Super User

One question to answer is to you need the counts to be shown in the multiple groups in a single procedure step or done one grouping rule at a time.

 

One method might be the use multilabel formats. Only a few procedures can use them for the different levels such as Proc Means/ Summary, Tabulate and Report (not proc sql). And the order of the creation in the format and options in the procedure. It is a moderately complex issue but I have attached some code with example data you could play with to see if the results for the demonstrated date represent some of what you want/ need.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 13 replies
  • 1636 views
  • 4 likes
  • 5 in conversation