Solved: Re: Adding the missing categories for which the attendance was 0

Novice_ · Posted 03-10-2022 01:34 PM

Hi,

I have a huge aggregate dataset that shows the number of people attending an event from different companies by year, age-group and gender. If no one from a particular age-group and gender attended then that row is missing. I want all the nested categories for each year with number attending = 0 for the ones missing here. Currently my data looks like:

I want it to look like below. For each company and year I want to have the complete set of age and gender combinations attended taking a value of 0 for those missing in the data above.

I will highly appreciate if I can be helped with this. Thank you.

PGStats · Posted 03-10-2022 09:34 PM

SQL cross joins provide you with the cartesian product (all combinations) of categories:

proc sql;
create table want AS
select 
    a.company,
    b.year,
    c.age,
    d.gender,
    coalesce(e.attended, 0) as attended
from 
    (select distinct company from have) as a cross join
    (select distinct year from have) as b cross join
    (select distinct age from have) as c cross join
    (select distinct gender from have) as d left join
    have as e 
        on a.company=e.company and b.year=e.year and 
            c.age=e.age and d.gender=e.gender;
quit;

(untested)

PG

View solution in original post

PaigeMiller · Posted 03-10-2022 01:36 PM

This is relatively easy when you create a report or table. So, does it really have to be that this is done in a data set? Or can it be done when the report/table is created?

--
Paige Miller

Novice_ · Posted 03-10-2022 02:34 PM

Can it be done in the data because I have to merge this data with another that has total number of employees in all companies for each year by complete age_group (shown here) and gender i.e.

and I have to merge by year age_group and gender to the attendance data by each company. There wouldn't be a problem if I have this total employees by company as well. Thanks

PaigeMiller · Posted 03-10-2022 02:37 PM

It can be done when the merge happens. Something like this:

data want;
     merge dataset1 dataset2;
     by year age gender;
     if missing(attended) then attended=0;
run;

Always a good thing to mention all the facts of the situation in your first message, rather than not mentioning relevant facts until someone asks.

--
Paige Miller

Novice_ · Posted 03-10-2022 07:48 PM

I am sorry merging them like you said does not give me the correct results. Please see below, I do not see company A 2013 19-29 female, company A 2013 40-49 male, company B 2012 19-29 female and company B 2012 40-49 female

Is there a way to have complete categories in attendance data before merging? Thank you.

SASKiwi · Posted 03-10-2022 08:22 PM

Something like this should populate the missing company values:

data want;
     merge dataset1 dataset2;
     by year age gender;
     if missing(attended) then attended=0;
run;

proc sort data = want;
  by descending company year age gender;
run;

data want2;
  set want;
  by descending company year age gender;
  retain Current_Company;
  if first.company then Current_Company = company;
  else if missing(company) then company = Current_Company;
run;

Novice_ · Posted 03-10-2022 09:23 PM

@SASKiwi thanks a lot for the code but unfortunately it is not working. I am getting the following:

SASKiwi · Posted 03-10-2022 09:48 PM

@Novice_ - Since you didn't provide any data in a suitable form (you can't copy and paste screenshots) I'm unable to test a solution.

@PGStats method is better but you need to provide test data using a DATA step with DATALINES for a working solution to be provided.

Novice_ · Posted 03-10-2022 10:38 PM

I am sorry about not providing the data earlier. Please see below;

This is the data that I have (its only the data from two companies and the original has a lot of companies):

data have;
infile datalines;
input company $ year age $ gender $ attended;
cards;
A 2012 19-29 Male 12
A 2012 19-29 Female 10
A 2012 30-39 Male 7
A 2012 30-39 Female 5
A 2012 40-49 Male 14
A 2012 40-49 Female 32
A 2012 50+ Male 11
A 2012 50+ Female 8
A 2013 19-29 Male 3
A 2013 30-39 Male 7
A 2013 30-39 Female 2
A 2013 40-49 Female 6
A 2013 50+ Male 3
A 2013 50+ Female 4
B 2012 19-29 Male 2
B 2012 30-39 Male 5
B 2012 30-39 Female 7
B 2012 40-49 Male 5
B 2013 30-39 Male 3
B 2013 30-39 Female 5
B 2013 40-49 Male 8
B 2013 40-49 Female 2
B 2013 50+ Male 6
B 2013 50+ Female 5
;
run;

And before doing anything else with it, I want to have all the year age and gender combinations for each company. This would give me a zero value for the combinations missing above. It should look like below:

A 2012 19-29 Male 12
A 2012 19-29 Female 10
A 2012 30-39 Male 7
A 2012 30-39 Female 5
A 2012 40-49 Male 14
A 2012 40-49 Female 32
A 2012 50+ Male 11
A 2012 50+ Female 8
A 2013 19-29 Male 3
A 2013 19-29 Female 0
A 2013 30-39 Male 7
A 2013 30-39 Female 2
A 2013 40-49 Male 0
A 2013 40-49 Female 6
A 2013 50+ Male 3
A 2013 50+ Female 4
B 2012 19-29 Male 2
A 2012 19-29 Female 0
B 2012 30-39 Male 5
B 2012 30-39 Female 7
B 2012 40-49 Male 5
B 2012 40-49 Female 0
B 2012 50+ Male 0
B 2012 50+ Female 0
B 2013 19-29 Male 0
B 2013 19-20 Male 0
B 2013 30-39 Male 3
B 2013 30-39 Female 5
B 2013 40-49 Male 8
B 2013 40-49 Female 2
B 2013 50+ Male 6
B 2013 50+ Female 5

I want it to look like above so that I can easily merge it with the following data that has complete combinations of year age and gender

data tomerge;
infile datalines;
input year age $ gender $ employees;
cards;
2012 19-29 Male 215
2012 19-29 Female 263
2012 30-39 Male 271
2012 30-39 Female 273
2012 40-49 Male 245
2012 40-49 Female 165
2012 50+ Male 237
2012 50+ Female 186
2013 19-29 Male 192
2013 19-29 Female 154
2013 30-39 Male 176
2013 30-39 Female 231
2013 40-49 Male 188
2013 40-49 Female 176
2013 50+ Male 134
2013 50+ Female 201
;

I hope it is more clear now.

Thank you.

PGStats · Posted 03-10-2022 09:34 PM

SQL cross joins provide you with the cartesian product (all combinations) of categories:

proc sql;
create table want AS
select 
    a.company,
    b.year,
    c.age,
    d.gender,
    coalesce(e.attended, 0) as attended
from 
    (select distinct company from have) as a cross join
    (select distinct year from have) as b cross join
    (select distinct age from have) as c cross join
    (select distinct gender from have) as d left join
    have as e 
        on a.company=e.company and b.year=e.year and 
            c.age=e.age and d.gender=e.gender;
quit;

(untested)

PG

Novice_ · Posted 03-10-2022 10:43 PM

@PGStats Thanks a lot. This is exactly what I was looking for. Could not have accomplished it without your help as I am pretty new to sas with zero experience of proc sql. Thanks again.

Ksharp · Posted 03-11-2022 06:59 AM

proc freq data=have noprint;
table company*year*gender / out=want sparse;
weight attended;
run;

Novice_ · Posted 03-11-2022 11:31 AM

Unfortunately, this dose not give me the required output.

PGStats · Posted 03-11-2022 12:44 PM

Try adding age to the table statement in @Ksharp suggestion. It should work.

table company*year*age*gender / out=want sparse;

PG

Novice_ · Posted 03-11-2022 12:58 PM

OMG it actually did. Thank you 🙂

SAS Innovate 2025: Register Now

SAS Training: Just a Click Away