Hi,
I have a huge aggregate dataset that shows the number of people attending an event from different companies by year, age-group and gender. If no one from a particular age-group and gender attended then that row is missing. I want all the nested categories for each year with number attending = 0 for the ones missing here. Currently my data looks like:
I want it to look like below. For each company and year I want to have the complete set of age and gender combinations attended taking a value of 0 for those missing in the data above.
I will highly appreciate if I can be helped with this. Thank you.
SQL cross joins provide you with the cartesian product (all combinations) of categories:
proc sql;
create table want AS
select 
    a.company,
    b.year,
    c.age,
    d.gender,
    coalesce(e.attended, 0) as attended
from 
    (select distinct company from have) as a cross join
    (select distinct year from have) as b cross join
    (select distinct age from have) as c cross join
    (select distinct gender from have) as d left join
    have as e 
        on a.company=e.company and b.year=e.year and 
            c.age=e.age and d.gender=e.gender;
quit;(untested)
This is relatively easy when you create a report or table. So, does it really have to be that this is done in a data set? Or can it be done when the report/table is created?
Can it be done in the data because I have to merge this data with another that has total number of employees in all companies for each year by complete age_group (shown here) and gender i.e.
and I have to merge by year age_group and gender to the attendance data by each company. There wouldn't be a problem if I have this total employees by company as well. Thanks
It can be done when the merge happens. Something like this:
data want;
     merge dataset1 dataset2;
     by year age gender;
     if missing(attended) then attended=0;
run;Always a good thing to mention all the facts of the situation in your first message, rather than not mentioning relevant facts until someone asks.
I am sorry merging them like you said does not give me the correct results. Please see below, I do not see company A 2013 19-29 female, company A 2013 40-49 male, company B 2012 19-29 female and company B 2012 40-49 female
Is there a way to have complete categories in attendance data before merging? Thank you.
Something like this should populate the missing company values:
data want;
     merge dataset1 dataset2;
     by year age gender;
     if missing(attended) then attended=0;
run;
proc sort data = want;
  by descending company year age gender;
run;
data want2;
  set want;
  by descending company year age gender;
  retain Current_Company;
  if first.company then Current_Company = company;
  else if missing(company) then company = Current_Company;
run;I am sorry about not providing the data earlier. Please see below;
This is the data that I have (its only the data from two companies and the original has a lot of companies):
data have;
infile datalines;
input company $ year age $ gender $ attended;
cards;
A 2012 19-29 Male 12
A 2012 19-29 Female 10
A 2012 30-39 Male 7
A 2012 30-39 Female 5
A 2012 40-49 Male 14
A 2012 40-49 Female 32
A 2012 50+ Male 11
A 2012 50+ Female 8
A 2013 19-29 Male 3
A 2013 30-39 Male 7
A 2013 30-39 Female 2
A 2013 40-49 Female 6
A 2013 50+ Male 3
A 2013 50+ Female 4
B 2012 19-29 Male 2
B 2012 30-39 Male 5
B 2012 30-39 Female 7
B 2012 40-49 Male 5
B 2013 30-39 Male 3
B 2013 30-39 Female 5
B 2013 40-49 Male 8
B 2013 40-49 Female 2
B 2013 50+ Male 6
B 2013 50+ Female 5
;
run;
And before doing anything else with it, I want to have all the year age and gender combinations for each company. This would give me a zero value for the combinations missing above. It should look like below:
A 2012 19-29 Male 12
A 2012 19-29 Female 10
A 2012 30-39 Male 7
A 2012 30-39 Female 5
A 2012 40-49 Male 14
A 2012 40-49 Female 32
A 2012 50+ Male 11
A 2012 50+ Female 8
A 2013 19-29 Male 3
A 2013 19-29 Female 0
A 2013 30-39 Male 7
A 2013 30-39 Female 2
A 2013 40-49 Male 0
A 2013 40-49 Female 6
A 2013 50+ Male 3
A 2013 50+ Female 4
B 2012 19-29 Male 2
A 2012 19-29 Female 0
B 2012 30-39 Male 5
B 2012 30-39 Female 7
B 2012 40-49 Male 5
B 2012 40-49 Female 0
B 2012 50+ Male 0
B 2012 50+ Female 0
B 2013 19-29 Male 0
B 2013 19-20 Male 0
B 2013 30-39 Male 3
B 2013 30-39 Female 5
B 2013 40-49 Male 8
B 2013 40-49 Female 2
B 2013 50+ Male 6
B 2013 50+ Female 5
I want it to look like above so that I can easily merge it with the following data that has complete combinations of year age and gender
data tomerge;
infile datalines;
input year age $ gender $ employees;
cards;
2012 19-29 Male 215
2012 19-29 Female 263
2012 30-39 Male 271
2012 30-39 Female 273
2012 40-49 Male 245
2012 40-49 Female 165
2012 50+ Male 237
2012 50+ Female 186
2013 19-29 Male 192
2013 19-29 Female 154
2013 30-39 Male 176
2013 30-39 Female 231
2013 40-49 Male 188
2013 40-49 Female 176
2013 50+ Male 134
2013 50+ Female 201
;
I hope it is more clear now.
Thank you.
SQL cross joins provide you with the cartesian product (all combinations) of categories:
proc sql;
create table want AS
select 
    a.company,
    b.year,
    c.age,
    d.gender,
    coalesce(e.attended, 0) as attended
from 
    (select distinct company from have) as a cross join
    (select distinct year from have) as b cross join
    (select distinct age from have) as c cross join
    (select distinct gender from have) as d left join
    have as e 
        on a.company=e.company and b.year=e.year and 
            c.age=e.age and d.gender=e.gender;
quit;(untested)
@PGStats Thanks a lot. This is exactly what I was looking for. Could not have accomplished it without your help as I am pretty new to sas with zero experience of proc sql. Thanks again.
proc freq data=have noprint;
table company*year*gender / out=want sparse;
weight attended;
run;Try adding age to the table statement in @Ksharp suggestion. It should work.
table company*year*age*gender / out=want sparse;
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.
