BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
JisangYu
Calcite | Level 5
Spoiler
 

I'm a novice programmer assisting research and trying to perform data manipulation using SAS. My data looks as follows:
KIS         Stock   Name Sum2000 Sum2001 Sum2002 Sum2003
036460  036460   AA      1                2             3           4
001471  033780   BB      5               6             7            8
004372  044820   CC      9              10           11           12

 

I want to manipulate the data to look like as follows:
KIS       Stock    Name  Sum
001402 036460 AA       1
001402 036460 AA       2
001402 036460 AA       3
001402 036460 AA       4
001471 033780 BB      5
001471 033780 BB      6
001471 033780 BB      7
001471 033780 BB      8
004372 044820 CC      9
004372 044820 CC      10
004372 044820 CC      11
004372 044820 CC      12

In short words, I want the company AA's sum for years 2000-2003 as first 4 rows, then company BB's sum for years 2000-2003 as next 4 rows, etc.
Since there are thousands of firms, I would really love to get some sample code to start this data manipulation.
I'd highly appreciate any kind of help!

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

@JisangYu wrote:
Spoiler
 

I'm a novice programmer assisting research and trying to perform data manipulation using SAS. My data looks as follows:
KIS         Stock   Name Sum2000 Sum2001 Sum2002 Sum2003
036460  036460   AA      1                2             3           4
001471  033780   BB      5               6             7            8
004372  044820   CC      9              10           11           12

 

I want to manipulate the data to look like as follows:
KIS       Stock    Name  Sum
001402 036460 AA       1
001402 036460 AA       2
001402 036460 AA       3
001402 036460 AA       4
001471 033780 BB      5
001471 033780 BB      6
001471 033780 BB      7
001471 033780 BB      8
004372 044820 CC      9
004372 044820 CC      10
004372 044820 CC      11
004372 044820 CC      12

In short words, I want the company AA's sum for years 2000-2003 as first 4 rows, then company BB's sum for years 2000-2003 as next 4 rows, etc.
Since there are thousands of firms, I would really love to get some sample code to start this data manipulation.
I'd highly appreciate any kind of help!


Are you sure you don't want to keep the Year information?

This does what you show in the example:

data have;
   input KIS $ Stock $   Name $ Sum2000 Sum2001 Sum2002 Sum2003;
datalines;
036460 036460 AA 1 2 3 4
001471 033780 BB 5 6 7 8
004372 044820 CC 9 10 11 12
;

data want;
   set have;
   array s (*) sum2000 - sum2003;
   do i=1 to dim(s);
      sum=s[i];
      output;
   end;
   drop i sum2000 - sum2003;
run;

The first data set is just to duplicate your example data, the second creates the desired output.

The Array with a Do loop is the typical way to do the same thing to multiple variables. The Output in the loop writes the record once for each value of the array. Drop removes variables.

 

If you wanted to keep the year information (strongly recommend)

data want;
   set have;
   array s (2000:2003) sum2000 - sum2003;
   do year=2000 to 2003;
      sum=s[year];
      output;
   end;
   drop sum2000 - sum2003;
run;

The difference here is the array definition sets up use of the year values 2000, 2001, 2002 and 2003 as the index value to reference the variable.

 

I would suggest not using SUM as a variable name. SAS will allow that but Sum is a key word for statistics in multiple procedures and can lead to hard to follow code.

View solution in original post

3 REPLIES 3
ballardw
Super User

@JisangYu wrote:
Spoiler
 

I'm a novice programmer assisting research and trying to perform data manipulation using SAS. My data looks as follows:
KIS         Stock   Name Sum2000 Sum2001 Sum2002 Sum2003
036460  036460   AA      1                2             3           4
001471  033780   BB      5               6             7            8
004372  044820   CC      9              10           11           12

 

I want to manipulate the data to look like as follows:
KIS       Stock    Name  Sum
001402 036460 AA       1
001402 036460 AA       2
001402 036460 AA       3
001402 036460 AA       4
001471 033780 BB      5
001471 033780 BB      6
001471 033780 BB      7
001471 033780 BB      8
004372 044820 CC      9
004372 044820 CC      10
004372 044820 CC      11
004372 044820 CC      12

In short words, I want the company AA's sum for years 2000-2003 as first 4 rows, then company BB's sum for years 2000-2003 as next 4 rows, etc.
Since there are thousands of firms, I would really love to get some sample code to start this data manipulation.
I'd highly appreciate any kind of help!


Are you sure you don't want to keep the Year information?

This does what you show in the example:

data have;
   input KIS $ Stock $   Name $ Sum2000 Sum2001 Sum2002 Sum2003;
datalines;
036460 036460 AA 1 2 3 4
001471 033780 BB 5 6 7 8
004372 044820 CC 9 10 11 12
;

data want;
   set have;
   array s (*) sum2000 - sum2003;
   do i=1 to dim(s);
      sum=s[i];
      output;
   end;
   drop i sum2000 - sum2003;
run;

The first data set is just to duplicate your example data, the second creates the desired output.

The Array with a Do loop is the typical way to do the same thing to multiple variables. The Output in the loop writes the record once for each value of the array. Drop removes variables.

 

If you wanted to keep the year information (strongly recommend)

data want;
   set have;
   array s (2000:2003) sum2000 - sum2003;
   do year=2000 to 2003;
      sum=s[year];
      output;
   end;
   drop sum2000 - sum2003;
run;

The difference here is the array definition sets up use of the year values 2000, 2001, 2002 and 2003 as the index value to reference the variable.

 

I would suggest not using SUM as a variable name. SAS will allow that but Sum is a key word for statistics in multiple procedures and can lead to hard to follow code.

JisangYu
Calcite | Level 5

Thanks for your help!

I will certainly take your advice on not using Sum as variable name.

I highly appreciate your kind reply.

JisangYu
Calcite | Level 5

Can I just ask you one thing:

What is the role of (*) in the code "array s (*) sum2000-sum2003" and the role of "$" in the code "input KIS $ Stock $ Name"?

Thanks for your help again!

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 336 views
  • 0 likes
  • 2 in conversation