Hi,
I have a huge aggregate dataset that shows the number of people attending an event from different companies by year, age-group and gender. If no one from a particular age-group and gender attended then that row is missing. I want all the nested categories for each year with number attending = 0 for the ones missing here. Currently my data looks like:
I want it to look like below. For each company and year I want to have the complete set of age and gender combinations attended taking a value of 0 for those missing in the data above.
I will highly appreciate if I can be helped with this. Thank you.
SQL cross joins provide you with the cartesian product (all combinations) of categories:
proc sql;
create table want AS
select
a.company,
b.year,
c.age,
d.gender,
coalesce(e.attended, 0) as attended
from
(select distinct company from have) as a cross join
(select distinct year from have) as b cross join
(select distinct age from have) as c cross join
(select distinct gender from have) as d left join
have as e
on a.company=e.company and b.year=e.year and
c.age=e.age and d.gender=e.gender;
quit;
(untested)
This is relatively easy when you create a report or table. So, does it really have to be that this is done in a data set? Or can it be done when the report/table is created?
Can it be done in the data because I have to merge this data with another that has total number of employees in all companies for each year by complete age_group (shown here) and gender i.e.
and I have to merge by year age_group and gender to the attendance data by each company. There wouldn't be a problem if I have this total employees by company as well. Thanks
It can be done when the merge happens. Something like this:
data want;
merge dataset1 dataset2;
by year age gender;
if missing(attended) then attended=0;
run;
Always a good thing to mention all the facts of the situation in your first message, rather than not mentioning relevant facts until someone asks.
I am sorry merging them like you said does not give me the correct results. Please see below, I do not see company A 2013 19-29 female, company A 2013 40-49 male, company B 2012 19-29 female and company B 2012 40-49 female
Is there a way to have complete categories in attendance data before merging? Thank you.
Something like this should populate the missing company values:
data want;
merge dataset1 dataset2;
by year age gender;
if missing(attended) then attended=0;
run;
proc sort data = want;
by descending company year age gender;
run;
data want2;
set want;
by descending company year age gender;
retain Current_Company;
if first.company then Current_Company = company;
else if missing(company) then company = Current_Company;
run;
I am sorry about not providing the data earlier. Please see below;
This is the data that I have (its only the data from two companies and the original has a lot of companies):
data have;
infile datalines;
input company $ year age $ gender $ attended;
cards;
A 2012 19-29 Male 12
A 2012 19-29 Female 10
A 2012 30-39 Male 7
A 2012 30-39 Female 5
A 2012 40-49 Male 14
A 2012 40-49 Female 32
A 2012 50+ Male 11
A 2012 50+ Female 8
A 2013 19-29 Male 3
A 2013 30-39 Male 7
A 2013 30-39 Female 2
A 2013 40-49 Female 6
A 2013 50+ Male 3
A 2013 50+ Female 4
B 2012 19-29 Male 2
B 2012 30-39 Male 5
B 2012 30-39 Female 7
B 2012 40-49 Male 5
B 2013 30-39 Male 3
B 2013 30-39 Female 5
B 2013 40-49 Male 8
B 2013 40-49 Female 2
B 2013 50+ Male 6
B 2013 50+ Female 5
;
run;
And before doing anything else with it, I want to have all the year age and gender combinations for each company. This would give me a zero value for the combinations missing above. It should look like below:
A 2012 19-29 Male 12
A 2012 19-29 Female 10
A 2012 30-39 Male 7
A 2012 30-39 Female 5
A 2012 40-49 Male 14
A 2012 40-49 Female 32
A 2012 50+ Male 11
A 2012 50+ Female 8
A 2013 19-29 Male 3
A 2013 19-29 Female 0
A 2013 30-39 Male 7
A 2013 30-39 Female 2
A 2013 40-49 Male 0
A 2013 40-49 Female 6
A 2013 50+ Male 3
A 2013 50+ Female 4
B 2012 19-29 Male 2
A 2012 19-29 Female 0
B 2012 30-39 Male 5
B 2012 30-39 Female 7
B 2012 40-49 Male 5
B 2012 40-49 Female 0
B 2012 50+ Male 0
B 2012 50+ Female 0
B 2013 19-29 Male 0
B 2013 19-20 Male 0
B 2013 30-39 Male 3
B 2013 30-39 Female 5
B 2013 40-49 Male 8
B 2013 40-49 Female 2
B 2013 50+ Male 6
B 2013 50+ Female 5
I want it to look like above so that I can easily merge it with the following data that has complete combinations of year age and gender
data tomerge;
infile datalines;
input year age $ gender $ employees;
cards;
2012 19-29 Male 215
2012 19-29 Female 263
2012 30-39 Male 271
2012 30-39 Female 273
2012 40-49 Male 245
2012 40-49 Female 165
2012 50+ Male 237
2012 50+ Female 186
2013 19-29 Male 192
2013 19-29 Female 154
2013 30-39 Male 176
2013 30-39 Female 231
2013 40-49 Male 188
2013 40-49 Female 176
2013 50+ Male 134
2013 50+ Female 201
;
I hope it is more clear now.
Thank you.
SQL cross joins provide you with the cartesian product (all combinations) of categories:
proc sql;
create table want AS
select
a.company,
b.year,
c.age,
d.gender,
coalesce(e.attended, 0) as attended
from
(select distinct company from have) as a cross join
(select distinct year from have) as b cross join
(select distinct age from have) as c cross join
(select distinct gender from have) as d left join
have as e
on a.company=e.company and b.year=e.year and
c.age=e.age and d.gender=e.gender;
quit;
(untested)
@PGStats Thanks a lot. This is exactly what I was looking for. Could not have accomplished it without your help as I am pretty new to sas with zero experience of proc sql. Thanks again.
proc freq data=have noprint;
table company*year*gender / out=want sparse;
weight attended;
run;
Try adding age to the table statement in @Ksharp suggestion. It should work.
table company*year*age*gender / out=want sparse;
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.