BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
AlG
Quartz | Level 8 AlG
Quartz | Level 8

Hi Everyone,

I have a dataset which includes weekly information about several companies from 2014 to 2018 (Identifier variables are: CompanyID, Year, Week).

This dataset is unbalanced, in the sense that for some of the companies there is no observation (and hence, no row) for some of the weeks of some of the years. I need to transform this dataset into a balanced one. In other words, I need to drop the year/week observations that do not exist for all of the firms. How can I do this in SAS?

Thanks so much in advance.

1 ACCEPTED SOLUTION

Accepted Solutions
PGStats
Opal | Level 21

This query does it:

 

proc sql;
create table want as
select 
    a.*
from 
    have as a inner join
    (select year, week from have group by year, week having count(distinct companyID) = 
        (select count(distinct companyID) from have)) as b
            on a.year=b.year and a.week=b.week;
quit;
PG

View solution in original post

3 REPLIES 3
ed_sas_member
Meteorite | Level 14

Hi @AlG 

Here is an approach to achieve this, using a PROC FREQ and the SPARSE option to report all possible combinations of the variable values, even if a combination does not occur in the data. 

data have;
	input CompanyID $ Year Week;
	datalines;
A 2014 1
A 2014 2
A 2014 3
A 2015 1
A 2015 2
A 2015 3
A 2017 1
A 2017 2
B 2014 1
B 2014 3
B 2016 3
B 2017 1
B 2017 2
;

/* Retrieve list of couples Year-Week where at list a company as no observation */
proc freq data=have noprint;
	table CompanyID*Year*Week / sparse out=have_freq (drop=percent);
run;

proc sort data=have_freq (where=(count=0) drop=CompanyID) out=list_couples (drop=count) nodupkey;
	by Year Week;
run;

/* Create table want */
proc sql;
	create table want as
	select a.*
	from have as a 
		 right join
		 (select Year, Week from have
		 except
		 select Year, Week from list_couples) as b
	on a.Year=b.Year and a.Week = b.Week
	order by CompanyID, Year, Week;
quit;

Best,

Patrick
Opal | Level 21

If you've got SAS/ETS licensed then you could use Proc Expand and generate the missing data points instead of throwing away actual data points.

 

Below one way to get what you've asked for.

data have;
	input CompanyID $ Year Week;
	datalines;
A 2014 1
A 2014 2
A 2014 3
A 2015 1
A 2015 2
A 2015 3
A 2017 1
A 2017 2
B 2014 1
B 2014 3
B 2016 3
B 2017 1
B 2017 2
;

proc sql;
  create table want as
  select
    CompanyID,
    Year,
    Week
  from
  (
    select 
      *, 
      count(*) as n_obs_perYearWeek
    from have
    group by Year,Week
  )
  group by CompanyID
  having max(n_obs_perYearWeek)=n_obs_perYearWeek
  order by CompanyID, year, week
  ;
quit;
PGStats
Opal | Level 21

This query does it:

 

proc sql;
create table want as
select 
    a.*
from 
    have as a inner join
    (select year, week from have group by year, week having count(distinct companyID) = 
        (select count(distinct companyID) from have)) as b
            on a.year=b.year and a.week=b.week;
quit;
PG

sas-innovate-wordmark-2025-midnight.png

Register Today!

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.


Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 1075 views
  • 3 likes
  • 4 in conversation