BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
ejarquejm
Fluorite | Level 6
  • counting distinct obervations in one col. and summing only the assotiated elements in another col
  • Here is my data.

First three columns are the relevant part. they have individuali.d., firm i.d., and age of each specific individual. due to other variables these first rows are repeated.

 

Data:

 

Indid    firmid   age    ,    other variables

1            1          29

1            1          29

2            1          34

2            1          34

3            2          51

3            2          51

3            2          51

4            2          25

5            3          62

 

I am using sql and i would like to get

 

Firmid   nworkers    sumofages

1              2                 63 

2              2                 76  

3              1                 62

 

Where of course 29+34 = 63 and 51 + 25 = 76.

 

Right now i can use (distinct indid) to count the number fo workers but i have not found the syntax to avoid duplicating the ages.

 

Thank you.

 

JE

1 ACCEPTED SOLUTION

Accepted Solutions
novinosrin
Tourmaline | Level 20

Take the straight forward SQL solution,  there is some fun in datastep approach to this problem

 


data have;
input Indid    firmid   age;
cards;
1            1          29
1            1          29
2            1          34
2            1          34
3            2          51
3            2          51
3            2          51
4            2          25
5            3          62
;


data want;
 if 0 then set have;
 do nworkers=1 by 1 until(last.firmid);
  do until(last.indid);
   set have;
   by firmid indid;
   sumofages=sum(first.indid*age,sumofages);
  end;
 end;
 keep  Firmid   nworkers    sumofages;
run;
 

View solution in original post

10 REPLIES 10
ed_sas_member
Meteorite | Level 14

Hi @ejarquejm 

You can try this:

proc sql;
	select firmid,
		   count(distinct Indid) as nworkers,
		   sum(distinct age) as sumofages
	from have
	group by firmid;
quit;

Best,

 

ejarquejm
Fluorite | Level 6

Hi @ed_sas_member 

 

That may not work because it is not age that has to be distinct.

 

It is the ages of the distinct individuals I need.

 

I should have written an example where 2 individuals have the same age. Apologies for that.

 

Consider then

 

Data:

 

Indid    firmid   age    ,    other variables

1            1          29

1            1          29

2            1          29

2            1          29

3            2          51

3            2          51

3            2          51

4            2          25

5            3          62

 

I am using sql and i would like to get

 

Firmid   nworkers    sumofages

1              2                 58 

2              2                 76  

3              1                 62

 

J

 

Thank you.

srujana_hm
Fluorite | Level 6

data new ;
input inst firmid age;
cards;
1 1 29
1 1 29
2 1 29
2 1 29
3 2 16
3 2 32
4 2 21
4 2 21
;
run;

proc sql;
create table new1 as
select distinct firmid ,inst as nwork, age as total_age from new
group by firmid;
run;

proc sql;
create table new2 as
select distinct firmid ,count(inst) as cnt, sum(age)as tot from new
group by firmid;
run;

 

output is 

firmid  cnt  tot

1         2    58

2        3     69

 

hope this will help.

novinosrin
Tourmaline | Level 20

Take the straight forward SQL solution,  there is some fun in datastep approach to this problem

 


data have;
input Indid    firmid   age;
cards;
1            1          29
1            1          29
2            1          34
2            1          34
3            2          51
3            2          51
3            2          51
4            2          25
5            3          62
;


data want;
 if 0 then set have;
 do nworkers=1 by 1 until(last.firmid);
  do until(last.indid);
   set have;
   by firmid indid;
   sumofages=sum(first.indid*age,sumofages);
  end;
 end;
 keep  Firmid   nworkers    sumofages;
run;
 
ejarquejm
Fluorite | Level 6

Hi @novinosrin 

 

this is interesting. let me try and get back to you.

 

thank you.

J

ejarquejm
Fluorite | Level 6

Hi @novinosrin 

 

This works

I can then merge with my outcome of sql to get what i want.

 

Thank you.

 

J

novinosrin
Tourmaline | Level 20

Yes I personally love the nested construct and boolean expression. I learned this technique from Guru @hashman  in one of his threads by further requesting him to detail everything I need to know. He was gracious. Oh well, we all learn and share. Ah, at least the sane ones. 🙂

novinosrin
Tourmaline | Level 20

@ejarquejm   And if SQL is your comfort zone, here is a SQL solution

 

data have;
input Indid    firmid   age;
cards;
1            1          29
1            1          29
2            1          29
2            1          29
3            2          51
3            2          51
3            2          51
4            2          25
5            3          62
;

proc sql;
create table want as
select Firmid , count(distinct Indid) as nworkers, sum(age) as sumofages
from (select distinct firmid,indid,age from have) 
group by Firmid;
quit;
Ksharp
Super User

Pick up the obs you want and calculated it.

 

data temp;

 set have;

by firmid Indid;

if first.Indid;

run;

proc summary data=temp out=want;

by firmid;

var age;

output out=want n=nworkers sum=sumofages;

run;

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 10 replies
  • 1845 views
  • 1 like
  • 5 in conversation