BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Novice_
Fluorite | Level 6

Hi,

I have a huge aggregate dataset that shows the number of people attending an event from different companies by year, age-group and gender. If no one from a particular age-group and gender attended then that row is missing. I want all the nested categories for each year with number attending = 0 for the ones missing here. Currently my data looks like:

 

Novice__0-1646936948998.png

I want it to look like below. For each company and year I want to have the complete set of age and gender combinations attended taking a value of 0 for those missing in the data above.

Novice__1-1646937126568.png

I will highly appreciate if I can be helped with this. Thank you. 

1 ACCEPTED SOLUTION

Accepted Solutions
PGStats
Opal | Level 21

SQL cross joins provide you with the cartesian product (all combinations) of categories:

 

proc sql;
create table want AS
select 
    a.company,
    b.year,
    c.age,
    d.gender,
    coalesce(e.attended, 0) as attended
from 
    (select distinct company from have) as a cross join
    (select distinct year from have) as b cross join
    (select distinct age from have) as c cross join
    (select distinct gender from have) as d left join
    have as e 
        on a.company=e.company and b.year=e.year and 
            c.age=e.age and d.gender=e.gender;
quit;

(untested)

PG

View solution in original post

14 REPLIES 14
PaigeMiller
Diamond | Level 26

This is relatively easy when you create a report or table. So, does it really have to be that this is done in a data set? Or can it be done when the report/table is created?

--
Paige Miller
Novice_
Fluorite | Level 6

Can it be done in the data because I have to merge this data with another that has total number of employees in all companies for each year by complete age_group (shown here) and gender i.e.

Novice__0-1646940667005.png

and I have to merge by year age_group and gender to the attendance data by each company. There wouldn't be a problem if I have this total employees by company as well. Thanks 

PaigeMiller
Diamond | Level 26

It can be done when the merge happens. Something like this:

 

data want;
     merge dataset1 dataset2;
     by year age gender;
     if missing(attended) then attended=0;
run;

Always a good thing to mention all the facts of the situation in your first message, rather than not mentioning relevant facts until someone asks. 

--
Paige Miller
Novice_
Fluorite | Level 6

I am sorry merging them like you said does not give me the correct results. Please see below, I do not see company A 2013 19-29 female, company A 2013 40-49 male, company B 2012 19-29 female and company B 2012 40-49 female 

Novice__0-1646959367707.png

Is there a way to have complete categories in attendance data before merging? Thank you.

SASKiwi
PROC Star

Something like this should populate the missing company values:

data want;
     merge dataset1 dataset2;
     by year age gender;
     if missing(attended) then attended=0;
run;

proc sort data = want;
  by descending company year age gender;
run;

data want2;
  set want;
  by descending company year age gender;
  retain Current_Company;
  if first.company then Current_Company = company;
  else if missing(company) then company = Current_Company;
run;
Novice_
Fluorite | Level 6

@SASKiwi thanks a lot for the code but unfortunately it is not working. I am getting the following:

Novice__0-1646965417757.png

 

SASKiwi
PROC Star

@Novice_ - Since you didn't provide any data in a suitable form (you can't copy and paste screenshots) I'm unable to test a solution. 

 

@PGStats method is better but you need to provide test data using a DATA step with DATALINES for a working solution to be provided.

Novice_
Fluorite | Level 6

I am sorry about not providing the data earlier. Please see below;

This is the data that I have (its only the data from two companies and the original has a lot of companies):

data have;
infile datalines;
input company $ year age $ gender $ attended;
cards;
A 2012 19-29 Male 12
A 2012 19-29 Female 10
A 2012 30-39 Male 7
A 2012 30-39 Female 5
A 2012 40-49 Male 14
A 2012 40-49 Female 32
A 2012 50+ Male 11
A 2012 50+ Female 8
A 2013 19-29 Male 3
A 2013 30-39 Male 7
A 2013 30-39 Female 2
A 2013 40-49 Female 6
A 2013 50+ Male 3
A 2013 50+ Female 4
B 2012 19-29 Male 2
B 2012 30-39 Male 5
B 2012 30-39 Female 7
B 2012 40-49 Male 5
B 2013 30-39 Male 3
B 2013 30-39 Female 5
B 2013 40-49 Male 8
B 2013 40-49 Female 2
B 2013 50+ Male 6
B 2013 50+ Female 5
;
run;

And before doing anything else with it, I want to have all the year age and gender combinations for each company. This would give me a zero value for the combinations missing above. It should look like below:

A 2012 19-29 Male 12
A 2012 19-29 Female 10
A 2012 30-39 Male 7
A 2012 30-39 Female 5
A 2012 40-49 Male 14
A 2012 40-49 Female 32
A 2012 50+ Male 11
A 2012 50+ Female 8
A 2013 19-29 Male 3
A 2013 19-29 Female 0
A 2013 30-39 Male 7
A 2013 30-39 Female 2
A 2013 40-49 Male 0
A 2013 40-49 Female 6
A 2013 50+ Male 3
A 2013 50+ Female 4
B 2012 19-29 Male 2
A 2012 19-29 Female 0
B 2012 30-39 Male 5
B 2012 30-39 Female 7
B 2012 40-49 Male 5
B 2012 40-49 Female 0
B 2012 50+ Male 0
B 2012 50+ Female 0
B 2013 19-29 Male 0
B 2013 19-20 Male 0
B 2013 30-39 Male 3
B 2013 30-39 Female 5
B 2013 40-49 Male 8
B 2013 40-49 Female 2
B 2013 50+ Male 6
B 2013 50+ Female 5

I want it to look like above so that I can easily merge it with the following data that has complete combinations of year age and gender

data tomerge;
infile datalines;
input year age $ gender $ employees;
cards;
2012 19-29 Male 215
2012 19-29 Female 263
2012 30-39 Male 271
2012 30-39 Female 273
2012 40-49 Male 245
2012 40-49 Female 165
2012 50+ Male 237
2012 50+ Female 186
2013 19-29 Male 192
2013 19-29 Female 154
2013 30-39 Male 176
2013 30-39 Female 231
2013 40-49 Male 188
2013 40-49 Female 176
2013 50+ Male 134
2013 50+ Female 201
;

I hope it is more clear now.

Thank you.

PGStats
Opal | Level 21

SQL cross joins provide you with the cartesian product (all combinations) of categories:

 

proc sql;
create table want AS
select 
    a.company,
    b.year,
    c.age,
    d.gender,
    coalesce(e.attended, 0) as attended
from 
    (select distinct company from have) as a cross join
    (select distinct year from have) as b cross join
    (select distinct age from have) as c cross join
    (select distinct gender from have) as d left join
    have as e 
        on a.company=e.company and b.year=e.year and 
            c.age=e.age and d.gender=e.gender;
quit;

(untested)

PG
Novice_
Fluorite | Level 6

@PGStats Thanks a lot. This is exactly what I was looking for. Could not have accomplished it without your help as I am pretty new to sas with zero experience of proc sql. Thanks again. 

Ksharp
Super User
proc freq data=have noprint;
table company*year*gender / out=want sparse;
weight attended;
run;
Novice_
Fluorite | Level 6
Unfortunately, this dose not give me the required output.
PGStats
Opal | Level 21

Try adding age to the table statement in @Ksharp suggestion. It should work.

 

table company*year*age*gender / out=want sparse;

 

PG
Novice_
Fluorite | Level 6
OMG it actually did. Thank you 🙂

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 14 replies
  • 3452 views
  • 6 likes
  • 5 in conversation