Thank you so much for your responses! My apologies for any lack of clarity -- trying to be concise and missed! This is a long-standing project (10 year grant) and I'm revising and extending macros written by another statistician... Looking at changes in cost for 78 medical conditions from 1999-2012. The data sets that I'm trying to identify and merge are actually separate data sets containing cost estimates, prevalence, and coefficient estimates for each replicate (1-100 for reps and the non-continuous index for strata which contain a third PSU) for each year of observation. I'm merging the cost, prevalence and beta estimates to create a merged set of data sets to examine the extent to which changes in medical spending are attributable to changes in the cost of treatment versus changes in the prevalence of the health condition over time. All of the replicates for each year are in separate SAS libraries by year: est99.replicate2_&j est00.replicate2_&j The problem I have is that the second replicate for the strata having 3 PSUs isn't consistent for all years. I somehow need to create indices for both year and strata number. I've used indices for variables and imputed data sets in the same library (er.g imputed data sets) but don't know how to do this for data sets across libraries. I also don't know how to do this for a non-continuous index that changes across years.... First set of replicates was very straightforward (do j =1 to 100). My existing syntax is run by strata number (other statistician didn't make it this far to deal with this...). For example, I want to identify and merge together all of the cost estimates for strata 3 for the years when this strata had 3 PSUs. I've done the easy case where the strata has 3 PSUs for all years. I'm now trying to figure out how to do it for the remaining strata without doing incredibly laborious and error prone hard coding.... Existing code looked like this: The names here are the macro call are the data sets containing means for each replicate (final_est_replicate2_&j for all 13 years) I use another macro to run this for all years and strata... %macro cgar_from_year_a (year=,rep=,master=,coeff=,cost=, name1=,name2=,name3=,name4=,name5=,name6=,name7=,......name13= ); data final_estimates99; set est99.&name1; year = 1999; run; data final_estimates00; set est00.&name2; year = 2000; run; data final_estimates01; set est01.&name3; year = 2001; run; Goes up to 2012. Problem now is that SAS of course stops when it encounters a year/library that doesn't contain a data set for that replicate (strata that doesn't have a third PSU for that year). The existing syntax concatenates all of the cost data sets: Data attrib_cost_long; set final_estimates99 final_estimates00 final_estimates01 final_estimates02 final_estimates03 final_estimates04 final_estimates05 final_estimates06 final_estimates07 final_estimates08 final_estimates09 final_estimates10 final_estimates11 final_estimates12; run; I need to figure out how to get something similar when the replicate data sets only exist for a subset of years of observation. At this point the only variable for number of PSUs for strata per year exists in another data set that I made. The names of the data sets are really the only indicator as to whether there was a third PSU in that strata for the year (if rep2_&j exists in libraryYR). Thank you so much for your responses! I've been asking around at my institution and no one has any idea of what to do (I'm more of a modeler...).
... View more