BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Novice_
Fluorite | Level 6

Hi,

I have a huge aggregate dataset that shows the number of people attending an event from different companies by year, age-group and gender. If no one from a particular age-group and gender attended then that row is missing. I want all the nested categories for each year with number attending = 0 for the ones missing here. Currently my data looks like:

 

Novice__0-1646936948998.png

I want it to look like below. For each company and year I want to have the complete set of age and gender combinations attended taking a value of 0 for those missing in the data above.

Novice__1-1646937126568.png

I will highly appreciate if I can be helped with this. Thank you. 

1 ACCEPTED SOLUTION

Accepted Solutions
PGStats
Opal | Level 21

SQL cross joins provide you with the cartesian product (all combinations) of categories:

 

proc sql;
create table want AS
select 
    a.company,
    b.year,
    c.age,
    d.gender,
    coalesce(e.attended, 0) as attended
from 
    (select distinct company from have) as a cross join
    (select distinct year from have) as b cross join
    (select distinct age from have) as c cross join
    (select distinct gender from have) as d left join
    have as e 
        on a.company=e.company and b.year=e.year and 
            c.age=e.age and d.gender=e.gender;
quit;

(untested)

PG

View solution in original post

14 REPLIES 14
PaigeMiller
Diamond | Level 26

This is relatively easy when you create a report or table. So, does it really have to be that this is done in a data set? Or can it be done when the report/table is created?

--
Paige Miller
Novice_
Fluorite | Level 6

Can it be done in the data because I have to merge this data with another that has total number of employees in all companies for each year by complete age_group (shown here) and gender i.e.

Novice__0-1646940667005.png

and I have to merge by year age_group and gender to the attendance data by each company. There wouldn't be a problem if I have this total employees by company as well. Thanks 

PaigeMiller
Diamond | Level 26

It can be done when the merge happens. Something like this:

 

data want;
     merge dataset1 dataset2;
     by year age gender;
     if missing(attended) then attended=0;
run;

Always a good thing to mention all the facts of the situation in your first message, rather than not mentioning relevant facts until someone asks. 

--
Paige Miller
Novice_
Fluorite | Level 6

I am sorry merging them like you said does not give me the correct results. Please see below, I do not see company A 2013 19-29 female, company A 2013 40-49 male, company B 2012 19-29 female and company B 2012 40-49 female 

Novice__0-1646959367707.png

Is there a way to have complete categories in attendance data before merging? Thank you.

SASKiwi
PROC Star

Something like this should populate the missing company values:

data want;
     merge dataset1 dataset2;
     by year age gender;
     if missing(attended) then attended=0;
run;

proc sort data = want;
  by descending company year age gender;
run;

data want2;
  set want;
  by descending company year age gender;
  retain Current_Company;
  if first.company then Current_Company = company;
  else if missing(company) then company = Current_Company;
run;
Novice_
Fluorite | Level 6

@SASKiwi thanks a lot for the code but unfortunately it is not working. I am getting the following:

Novice__0-1646965417757.png

 

SASKiwi
PROC Star

@Novice_ - Since you didn't provide any data in a suitable form (you can't copy and paste screenshots) I'm unable to test a solution. 

 

@PGStats method is better but you need to provide test data using a DATA step with DATALINES for a working solution to be provided.

Novice_
Fluorite | Level 6

I am sorry about not providing the data earlier. Please see below;

This is the data that I have (its only the data from two companies and the original has a lot of companies):

data have;
infile datalines;
input company $ year age $ gender $ attended;
cards;
A 2012 19-29 Male 12
A 2012 19-29 Female 10
A 2012 30-39 Male 7
A 2012 30-39 Female 5
A 2012 40-49 Male 14
A 2012 40-49 Female 32
A 2012 50+ Male 11
A 2012 50+ Female 8
A 2013 19-29 Male 3
A 2013 30-39 Male 7
A 2013 30-39 Female 2
A 2013 40-49 Female 6
A 2013 50+ Male 3
A 2013 50+ Female 4
B 2012 19-29 Male 2
B 2012 30-39 Male 5
B 2012 30-39 Female 7
B 2012 40-49 Male 5
B 2013 30-39 Male 3
B 2013 30-39 Female 5
B 2013 40-49 Male 8
B 2013 40-49 Female 2
B 2013 50+ Male 6
B 2013 50+ Female 5
;
run;

And before doing anything else with it, I want to have all the year age and gender combinations for each company. This would give me a zero value for the combinations missing above. It should look like below:

A 2012 19-29 Male 12
A 2012 19-29 Female 10
A 2012 30-39 Male 7
A 2012 30-39 Female 5
A 2012 40-49 Male 14
A 2012 40-49 Female 32
A 2012 50+ Male 11
A 2012 50+ Female 8
A 2013 19-29 Male 3
A 2013 19-29 Female 0
A 2013 30-39 Male 7
A 2013 30-39 Female 2
A 2013 40-49 Male 0
A 2013 40-49 Female 6
A 2013 50+ Male 3
A 2013 50+ Female 4
B 2012 19-29 Male 2
A 2012 19-29 Female 0
B 2012 30-39 Male 5
B 2012 30-39 Female 7
B 2012 40-49 Male 5
B 2012 40-49 Female 0
B 2012 50+ Male 0
B 2012 50+ Female 0
B 2013 19-29 Male 0
B 2013 19-20 Male 0
B 2013 30-39 Male 3
B 2013 30-39 Female 5
B 2013 40-49 Male 8
B 2013 40-49 Female 2
B 2013 50+ Male 6
B 2013 50+ Female 5

I want it to look like above so that I can easily merge it with the following data that has complete combinations of year age and gender

data tomerge;
infile datalines;
input year age $ gender $ employees;
cards;
2012 19-29 Male 215
2012 19-29 Female 263
2012 30-39 Male 271
2012 30-39 Female 273
2012 40-49 Male 245
2012 40-49 Female 165
2012 50+ Male 237
2012 50+ Female 186
2013 19-29 Male 192
2013 19-29 Female 154
2013 30-39 Male 176
2013 30-39 Female 231
2013 40-49 Male 188
2013 40-49 Female 176
2013 50+ Male 134
2013 50+ Female 201
;

I hope it is more clear now.

Thank you.

PGStats
Opal | Level 21

SQL cross joins provide you with the cartesian product (all combinations) of categories:

 

proc sql;
create table want AS
select 
    a.company,
    b.year,
    c.age,
    d.gender,
    coalesce(e.attended, 0) as attended
from 
    (select distinct company from have) as a cross join
    (select distinct year from have) as b cross join
    (select distinct age from have) as c cross join
    (select distinct gender from have) as d left join
    have as e 
        on a.company=e.company and b.year=e.year and 
            c.age=e.age and d.gender=e.gender;
quit;

(untested)

PG
Novice_
Fluorite | Level 6

@PGStats Thanks a lot. This is exactly what I was looking for. Could not have accomplished it without your help as I am pretty new to sas with zero experience of proc sql. Thanks again. 

Ksharp
Super User
proc freq data=have noprint;
table company*year*gender / out=want sparse;
weight attended;
run;
Novice_
Fluorite | Level 6
Unfortunately, this dose not give me the required output.
PGStats
Opal | Level 21

Try adding age to the table statement in @Ksharp suggestion. It should work.

 

table company*year*age*gender / out=want sparse;

 

PG
Novice_
Fluorite | Level 6
OMG it actually did. Thank you 🙂

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 14 replies
  • 2190 views
  • 6 likes
  • 5 in conversation