dear all,
my dataset has multiple variables each variable is repeated for multiple number of years,
Each column in he dataset represent a variable-year.
I have to transpose the data based on variable by creating separate column of each variable containing all the time series
the present format of the dataset is as follows
Company Name | PBDITA2010 | Depreciation2010 | sales2010 | PBDITA2011 | Depreciation2011 | sales2011 | …………….. | PBDITA2018 | Depreciation2018 | sales2018 |
20 Microns Ltd. | 238.7 | 47.5 | 1768.2 | 256.1 | 53.3 | 2349.8 | 545 | 91.1 | 3845.7 | |
3I Infotech Ltd. | 649.6 | 405 | 5195.3 | 3087.6 | 517.5 | 5479.2 | 1578.8 | 71.6 | 2368.2 | |
3M India Ltd. | 1648.8 | 169.5 | 10912.8 | 1727.9 | 173.7 | 11891.2 | 5308.7 | 418 | 25597.6 | |
3P Land Holdings Ltd. | 70.5 | 11.4 | 701.7 | 41.8 | 16.6 | 989.7 | 15.9 | 0.7 | 1.4 | |
A B B India Ltd. | 7455.1 | 485.1 | 62419.9 | 3618.5 | 516.6 | 62947.7 | 1534.7 | 221.6 | 2341 | |
A B C Gas (International) Ltd. | 1.8 | 0.1 | 136.3 | 3.6 | 0.1 | 239.9 | 3.1 | 0.9 | 100.8 | |
A B C India Ltd. | 84.5 | 24.1 | 1442.7 | 197.9 | 42 | 2039.4 | 11.6 | 6.3 | 13.4 | |
A B G Shipyard Ltd. | 6427.9 | 524.1 | 18076.7 | 6837.6 | 630.3 | 20770.9 | 156.7 | 8.4 | 504.9 |
the required format should be as follows
Company Name | year | PBDITA | Depreciation | sales |
20 Microns Ltd. | 2010 | 238.7 | 47.5 | 1768.2 |
3I Infotech Ltd. | 2010 | 649.6 | 405 | 5195.3 |
3M India Ltd. | 2010 | 1648.8 | 169.5 | 10912.8 |
3P Land Holdings Ltd. | 2010 | 70.5 | 11.4 | 701.7 |
A B B India Ltd. | 2010 | 7455.1 | 485.1 | 62419.9 |
A B C Gas (International) Ltd. | 2010 | 1.8 | 0.1 | 136.3 |
A B C India Ltd. | 2010 | 84.5 | 24.1 | 1442.7 |
A B G Shipyard Ltd. | 2010 | 6427.9 | 524.1 | 18076.7 |
20 Microns Ltd. | 2011 | 256.1 | 53.3 | 2349.8 |
3I Infotech Ltd. | 2011 | 3087.6 | 517.5 | 5479.2 |
3M India Ltd. | 2011 | 1727.9 | 173.7 | 11891.2 |
3P Land Holdings Ltd. | 2011 | 41.8 | 16.6 | 989.7 |
A B B India Ltd. | 2011 | 3618.5 | 516.6 | 62947.7 |
A B C Gas (International) Ltd. | 2011 | 3.6 | 0.1 | 239.9 |
A B C India Ltd. | 2011 | 197.9 | 42 | 2039.4 |
A B G Shipyard Ltd. | 2011 | 6837.6 | 630.3 | 20770.9 |
20 Microns Ltd. | 2018 | 545 | 91.1 | 3845.7 |
3I Infotech Ltd. | 2018 | 1578.8 | 71.6 | 2368.2 |
3M India Ltd. | 2018 | 5308.7 | 418 | 25597.6 |
3P Land Holdings Ltd. | 2018 | 15.9 | 0.7 | 1.4 |
A B B India Ltd. | 2018 | 1534.7 | 221.6 | 2341 |
A B C Gas (International) Ltd. | 2018 | 3.1 | 0.9 | 100.8 |
A B C India Ltd. | 2018 | 11.6 | 6.3 | 13.4 |
A B G Shipyard Ltd. | 2018 | 156.7 | 8.4 | 504.9 |
thanks in advance
Hi @srikanthyadav44 Pretty straight forward and has been asked many times in this forum
Keep it simple in 2 steps
data want;
set have;
array t(2010:2018,3) PBDITA2010--sales2018;
array j PBDITA Depreciation sales;
do year= 2010 to 2018;
do _n_=1 to dim(j);
j(_n_)=t(year,_n_);
end;
output;
end;
keep Company_Name year PBDITA Depreciation sales;
run;
proc sort data=want out=final_want;
by year;
run;
Alternatively, Should memory be very sufficient
data _null_ ;
if _n_=1 then do;
dcl hash H (ordered: "A",multidata:'y') ;
h.definekey ("year") ;
h.definedata ("Company_Name","year","PBDITA","Depreciation","sales") ;
h.definedone () ;
end;
set have end=z;
array t(2010:2018,3) PBDITA2010--sales2018;
array j PBDITA Depreciation sales;
do year= 2010 to 2018;
do _n_=1 to dim(j);
j(_n_)=t(year,_n_);
end;
rc=h.add();
end;
if z then h.output(dataset:'want');
run;
Hi @srikanthyadav44 Pretty straight forward and has been asked many times in this forum
Keep it simple in 2 steps
data want;
set have;
array t(2010:2018,3) PBDITA2010--sales2018;
array j PBDITA Depreciation sales;
do year= 2010 to 2018;
do _n_=1 to dim(j);
j(_n_)=t(year,_n_);
end;
output;
end;
keep Company_Name year PBDITA Depreciation sales;
run;
proc sort data=want out=final_want;
by year;
run;
Alternatively, Should memory be very sufficient
data _null_ ;
if _n_=1 then do;
dcl hash H (ordered: "A",multidata:'y') ;
h.definekey ("year") ;
h.definedata ("Company_Name","year","PBDITA","Depreciation","sales") ;
h.definedone () ;
end;
set have end=z;
array t(2010:2018,3) PBDITA2010--sales2018;
array j PBDITA Depreciation sales;
do year= 2010 to 2018;
do _n_=1 to dim(j);
j(_n_)=t(year,_n_);
end;
rc=h.add();
end;
if z then h.output(dataset:'want');
run;
dear Mr. novinosrin thanks a lot. your SAS code is workign excellently. it is very easy and simple.
You can perhaps just use two PROC TRANSPOSE steps. Then you might not need to know how many variables or how many years are represented in the input. As long as all of the variable names end with 4 digit year numbers.
proc transpose data=have out=step1;
by company ;
var _numeric_;
run;
data step2;
set step1 ;
year = input(substr(_name_,length(_name)-3),4.);
_name_ = substr(_name_,1,length(_name_)-4);
run;
proc sort data=step2;
by company year ;
run;
proc transpose data=step2 out=want ;
by company year;
id _name_;
var col1;
run;
Given your example data all three proposed methods should work. The following will also work, but doesn't have the limitations of the other methods. The method doesn't have any memory constraints, will work regardless of how your variable names are constructed (i.e., even if the contain prefixes, delimiters and or suffixes), and can work with any combination of numeric and character variables.
The macro can be downloaded from: https://github.com/gerhard1050/Untranspose-a-Wide-File
Once you have and compile (i.e., run) the macro, the code you would need to solve your problem would be:
%untranspose(data=have, out=want, by=Company, id=year,
var=PBDITA Depreciation sales)
proc sort data=want;
by year Company;
run;
Art, CEO, AnalystFinder.com
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9.
Early bird rate extended! Save $200 when you sign up by March 31.
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.