First three columns are the relevant part. they have individuali.d., firm i.d., and age of each specific individual. due to other variables these first rows are repeated.
Data:
Indid firmid age , other variables
1 1 29
1 1 29
2 1 34
2 1 34
3 2 51
3 2 51
3 2 51
4 2 25
5 3 62
I am using sql and i would like to get
Firmid nworkers sumofages
1 2 63
2 2 76
3 1 62
Where of course 29+34 = 63 and 51 + 25 = 76.
Right now i can use (distinct indid) to count the number fo workers but i have not found the syntax to avoid duplicating the ages.
Thank you.
JE
Take the straight forward SQL solution, there is some fun in datastep approach to this problem
data have;
input Indid firmid age;
cards;
1 1 29
1 1 29
2 1 34
2 1 34
3 2 51
3 2 51
3 2 51
4 2 25
5 3 62
;
data want;
if 0 then set have;
do nworkers=1 by 1 until(last.firmid);
do until(last.indid);
set have;
by firmid indid;
sumofages=sum(first.indid*age,sumofages);
end;
end;
keep Firmid nworkers sumofages;
run;
Hi @ejarquejm
You can try this:
proc sql;
select firmid,
count(distinct Indid) as nworkers,
sum(distinct age) as sumofages
from have
group by firmid;
quit;
Best,
That may not work because it is not age that has to be distinct.
It is the ages of the distinct individuals I need.
I should have written an example where 2 individuals have the same age. Apologies for that.
Consider then
Data:
Indid firmid age , other variables
1 1 29
1 1 29
2 1 29
2 1 29
3 2 51
3 2 51
3 2 51
4 2 25
5 3 62
I am using sql and i would like to get
Firmid nworkers sumofages
1 2 58
2 2 76
3 1 62
J
Thank you.
data new ;
input inst firmid age;
cards;
1 1 29
1 1 29
2 1 29
2 1 29
3 2 16
3 2 32
4 2 21
4 2 21
;
run;
proc sql;
create table new1 as
select distinct firmid ,inst as nwork, age as total_age from new
group by firmid;
run;
proc sql;
create table new2 as
select distinct firmid ,count(inst) as cnt, sum(age)as tot from new
group by firmid;
run;
output is
firmid cnt tot
1 2 58
2 3 69
hope this will help.
Take the straight forward SQL solution, there is some fun in datastep approach to this problem
data have;
input Indid firmid age;
cards;
1 1 29
1 1 29
2 1 34
2 1 34
3 2 51
3 2 51
3 2 51
4 2 25
5 3 62
;
data want;
if 0 then set have;
do nworkers=1 by 1 until(last.firmid);
do until(last.indid);
set have;
by firmid indid;
sumofages=sum(first.indid*age,sumofages);
end;
end;
keep Firmid nworkers sumofages;
run;
Yes I personally love the nested construct and boolean expression. I learned this technique from Guru @hashman in one of his threads by further requesting him to detail everything I need to know. He was gracious. Oh well, we all learn and share. Ah, at least the sane ones. 🙂
@ejarquejm And if SQL is your comfort zone, here is a SQL solution
data have;
input Indid firmid age;
cards;
1 1 29
1 1 29
2 1 29
2 1 29
3 2 51
3 2 51
3 2 51
4 2 25
5 3 62
;
proc sql;
create table want as
select Firmid , count(distinct Indid) as nworkers, sum(age) as sumofages
from (select distinct firmid,indid,age from have)
group by Firmid;
quit;
Pick up the obs you want and calculated it.
data temp;
set have;
by firmid Indid;
if first.Indid;
run;
proc summary data=temp out=want;
by firmid;
var age;
output out=want n=nworkers sum=sumofages;
run;
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.