Thanks for your answers Tom, here are the answers for your questions: - What is the overall goal here? What is the meaning of the data and what is the analysis you are trying to do? Those data will be used for a ML model. A left outer join brings all of tables together. The new table is going to be the dataset to modelized. Each table represent measures from different periods P: P0 current period, P1 the previous period, P2 the period before P1, ... until PX which is the ultimate period. That's why it required a loop. 1° P0-P1 2° P1-P2 3° P2-P3 .... PX-1 - PX. - What does this P0, P1, etc suffix on the variable names mean? Periods of time, example a year, measurement for a given year. - Why is the data in multiple datasets? Why not just store all of the data in one dataset? No problem, this step can be done before make a comparison step. In this case, the variable names have to keep P0, P1, P2, etc. at the end of the variable name. - Why do the variable names change between the datasets? The value is computed period by period. Table Have_PO are results for the current year, Have_P1 results for the previous year, Have_P2 etc. Why not just have a separate variable with values like P0 or P1 (or perhaps numeric variable with 0 and 1) to indicate which P value this observation is for? It's a possibility. Do you always have the same 100 variables? Yes, periods P0, P1, P2,...,PX will always have the same variables Do you have SAS/IML license? Could you load the two datasets into matrices and just subtract them? NO Why do you need to make some many different difference datasets? For a ML model, to follow the evolution between periods. Then tables are united with a left outer join. if you have other questions, don't hesitate, your help is greatly appreciated.
... View more