I'm a novice programmer assisting research and trying to perform data manipulation using SAS. My data looks as follows:
KIS Stock Name Sum2000 Sum2001 Sum2002 Sum2003
036460 036460 AA 1 2 3 4
001471 033780 BB 5 6 7 8
004372 044820 CC 9 10 11 12
I want to manipulate the data to look like as follows:
KIS Stock Name Sum
001402 036460 AA 1
001402 036460 AA 2
001402 036460 AA 3
001402 036460 AA 4
001471 033780 BB 5
001471 033780 BB 6
001471 033780 BB 7
001471 033780 BB 8
004372 044820 CC 9
004372 044820 CC 10
004372 044820 CC 11
004372 044820 CC 12
In short words, I want the company AA's sum for years 2000-2003 as first 4 rows, then company BB's sum for years 2000-2003 as next 4 rows, etc.
Since there are thousands of firms, I would really love to get some sample code to start this data manipulation.
I'd highly appreciate any kind of help!
@JisangYu wrote:
I'm a novice programmer assisting research and trying to perform data manipulation using SAS. My data looks as follows:
KIS Stock Name Sum2000 Sum2001 Sum2002 Sum2003
036460 036460 AA 1 2 3 4
001471 033780 BB 5 6 7 8
004372 044820 CC 9 10 11 12
I want to manipulate the data to look like as follows:
KIS Stock Name Sum
001402 036460 AA 1
001402 036460 AA 2
001402 036460 AA 3
001402 036460 AA 4
001471 033780 BB 5
001471 033780 BB 6
001471 033780 BB 7
001471 033780 BB 8
004372 044820 CC 9
004372 044820 CC 10
004372 044820 CC 11
004372 044820 CC 12In short words, I want the company AA's sum for years 2000-2003 as first 4 rows, then company BB's sum for years 2000-2003 as next 4 rows, etc.
Since there are thousands of firms, I would really love to get some sample code to start this data manipulation.
I'd highly appreciate any kind of help!
Are you sure you don't want to keep the Year information?
This does what you show in the example:
data have; input KIS $ Stock $ Name $ Sum2000 Sum2001 Sum2002 Sum2003; datalines; 036460 036460 AA 1 2 3 4 001471 033780 BB 5 6 7 8 004372 044820 CC 9 10 11 12 ; data want; set have; array s (*) sum2000 - sum2003; do i=1 to dim(s); sum=s[i]; output; end; drop i sum2000 - sum2003; run;
The first data set is just to duplicate your example data, the second creates the desired output.
The Array with a Do loop is the typical way to do the same thing to multiple variables. The Output in the loop writes the record once for each value of the array. Drop removes variables.
If you wanted to keep the year information (strongly recommend)
data want; set have; array s (2000:2003) sum2000 - sum2003; do year=2000 to 2003; sum=s[year]; output; end; drop sum2000 - sum2003; run;
The difference here is the array definition sets up use of the year values 2000, 2001, 2002 and 2003 as the index value to reference the variable.
I would suggest not using SUM as a variable name. SAS will allow that but Sum is a key word for statistics in multiple procedures and can lead to hard to follow code.
@JisangYu wrote:
I'm a novice programmer assisting research and trying to perform data manipulation using SAS. My data looks as follows:
KIS Stock Name Sum2000 Sum2001 Sum2002 Sum2003
036460 036460 AA 1 2 3 4
001471 033780 BB 5 6 7 8
004372 044820 CC 9 10 11 12
I want to manipulate the data to look like as follows:
KIS Stock Name Sum
001402 036460 AA 1
001402 036460 AA 2
001402 036460 AA 3
001402 036460 AA 4
001471 033780 BB 5
001471 033780 BB 6
001471 033780 BB 7
001471 033780 BB 8
004372 044820 CC 9
004372 044820 CC 10
004372 044820 CC 11
004372 044820 CC 12In short words, I want the company AA's sum for years 2000-2003 as first 4 rows, then company BB's sum for years 2000-2003 as next 4 rows, etc.
Since there are thousands of firms, I would really love to get some sample code to start this data manipulation.
I'd highly appreciate any kind of help!
Are you sure you don't want to keep the Year information?
This does what you show in the example:
data have; input KIS $ Stock $ Name $ Sum2000 Sum2001 Sum2002 Sum2003; datalines; 036460 036460 AA 1 2 3 4 001471 033780 BB 5 6 7 8 004372 044820 CC 9 10 11 12 ; data want; set have; array s (*) sum2000 - sum2003; do i=1 to dim(s); sum=s[i]; output; end; drop i sum2000 - sum2003; run;
The first data set is just to duplicate your example data, the second creates the desired output.
The Array with a Do loop is the typical way to do the same thing to multiple variables. The Output in the loop writes the record once for each value of the array. Drop removes variables.
If you wanted to keep the year information (strongly recommend)
data want; set have; array s (2000:2003) sum2000 - sum2003; do year=2000 to 2003; sum=s[year]; output; end; drop sum2000 - sum2003; run;
The difference here is the array definition sets up use of the year values 2000, 2001, 2002 and 2003 as the index value to reference the variable.
I would suggest not using SUM as a variable name. SAS will allow that but Sum is a key word for statistics in multiple procedures and can lead to hard to follow code.
Thanks for your help!
I will certainly take your advice on not using Sum as variable name.
I highly appreciate your kind reply.
Can I just ask you one thing:
What is the role of (*) in the code "array s (*) sum2000-sum2003" and the role of "$" in the code "input KIS $ Stock $ Name"?
Thanks for your help again!
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.