BookmarkSubscribeRSS Feed
mhoolsema
Calcite | Level 5

I have a panel dataset that is unbalanced.  There are not currently observations for every subject in every year.  I would like to create a balanced dataset that has a row for every subject at every year, but have missing observations for the other variables where the data is currently missing.  This is easy to do in STATA using the tsfill command, but I can't find a way to easily do this in SAS.  I'm assuming I need to create some kind of loop in a data step, but I'm not sure how to apply the loop only to time periods that are currently missing.

 

My file has about 100,000 records before expanding to the balanced panel, so I need to be able to do this programmatically and not looking at each individual case.  

 

data have;
  infile datalines ;
  input id:$8. Year:8. Department:$8. Program_Code:$8.;
 	datalines;
	 PersonA 2000 DeptA Prog20
	 PersonA 2001 DeptA Prog20
	 PersonA 2002 DeptA Prog20
	 PersonA 2003 DeptA Prog20
	 PersonB 2002 DeptB Prog15
	 PersonB 2003 DeptB Prog15
	 PersonB 2004 DeptB Prog15
	 PersonC 2001 DeptC Prog05
	 PersonC 2002 DeptC Prog05
	 PersonC 2003 DeptC Prog05
	 PersonC 2004 DeptD Prog25
	 PersonC 2005 DeptD Prog25
	 ;
 run;

 
data want;
   infile datalines ;
   input id:$8. Year:8. Department:$8. Program_Code:$8.;
	 datalines;
	 PersonA 2000 DeptA Prog20
	 PersonA 2001 DeptA Prog20
	 PersonA 2002 DeptA Prog20
	 PersonA 2003 DeptA Prog20
	 PersonA 2004 . .
	 PersonA 2005 . .

	 PersonB 2000 . .
	 PersonB 2001 . .
	 PersonB 2002 DeptB Prog15
	 PersonB 2003 DeptB Prog15
	 PersonB 2004 DeptB Prog15
	 PersonB 2005 . .

	 PersonC 2000 . .
	 PersonC 2001 DeptC Prog05
	 PersonC 2002 DeptC Prog05
	 PersonC 2003 DeptC Prog05
	 PersonC 2004 DeptD Prog25
	 PersonC 2005 DeptD Prog25
	 ;
 run;

The post at https://communities.sas.com/t5/General-SAS-Programming/How-to-balance-the-quot-unbalanced-data-quot-... said this can be done with PROC TRANSPOSE, but I need the data in the long format, and not the wide format.  I have tried PROC Expand, but because it is character data I haven't been able to get that to work.

 

 

 

3 REPLIES 3
PGStats
Opal | Level 21

SAS/SQL should be efficient enough for this :

 

proc sql;
select 
    a.id,
    b.year,
    c.Department,
    c.program_Code
from
    (select unique id from have as a) cross join
    (select unique year from have as b) left join
    have as c on a.id=c.id and b.year=c.year;
quit;
PG
mhoolsema
Calcite | Level 5

When I use this I get the following note in the log:

 

NOTE: The execution of this query involves performing one or more Cartesian product joins that
can not be optimized.

 

Is this something to be worried about, or something that can be ignored?  

PGStats
Opal | Level 21

No worry. This note informs you that you are doing exactly what you want when performing the CROSS JOIN, i.e. forming the cartesian product of every ID with every YEAR.

PG

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 3032 views
  • 0 likes
  • 2 in conversation