08-03-2012 07:56 AM
I am attaching an Excel sheet with two worksheets. The first worksheet has the original data (Original-data). The second worksheet (New-data) has the transformed data sheet that I want for processing. How do I convert them? Can any body help me in writing a program?
Please note that not all countries have data for all years or for all variables. The new data set must have all countries, all years, and all variables, but there will be missing data in the appropriate places.
You can also reply me at email@example.com.
Thank you for your help in advance. Have a good day!
08-03-2012 08:34 AM
try this code below, it will give you what is in the orignal data. to have all the countries and all the years and all the variables you first have to create a table that has all the country/year combinations as well as all the variables and then merge this with the data below
08-03-2012 08:53 AM
Like Bruno suggested, proc transpose requires the least coding to accomplish what you want. Conversely, it can also be done in a datastep (Note: when I imported your spreadsheet the year variables imported as _000 to _011):
data want (keep=country year v1-v4);
array thedata(12) _000-_011;
if first.country then do;
call missing(of years(*));
do i=1 to 12;
if last.country then do;
do i=1 to 12;
08-03-2012 09:50 AM
Both approaches are fine and will work. However the data step approach can become much more efficient in the following circumstances.
Case 1, if the initial file is very large and is possibly stored on cartridge / tape, and I am talking a few hundred million records or more.
In such a case it is possible that proc transpose may not work due to the machine memory limitations.
The sort process may become time consuming and the sort may not be needed later
If the input file is on cartridge then output may require large disk space or back to cartridge / tape which can slow the process
Case 2. Sparsely populated datasets
The second step allows the user to control what goes into the transposed file. So if the base data was sparsely populated with a large number of nulls or zeroes which were to be deleted later on, then the data step process can eliminate them with a simple IF statement before the output reducing the size of the dataset.