I have data that is structured in a matrix format exactly like this (except with more rows and columns). I've copied the raw data as I am not even sure how to load this into SAS.
firm | 1001 | 1001 | 1001 | 1001 | 1001 | 1001 | 1001 | 1001 | 1001 | 1001 | 1003 | 1003 | 1003 | 1003 | 1003 | 1003 | 1003 | 1003 | 1003 | |
year | 2005 | 2006 | 2007 | 2008 | 2009 | 2010 | 2011 | 2012 | 2013 | 2014 | 2010 | 2011 | 2012 | 2013 | 2014 | 2015 | 2016 | 2017 | 2018 | |
individual | event_year | |||||||||||||||||||
103307-86 | 2010 | 0.3 | 0.0 | 1.0 | 0.6 | 0.0 | 0.0 | 0.9 | 0.8 | 0.8 | 0.0 | 1.0 | 0.7 | 0.1 | 0.1 | 0.7 | 0.4 | 0.3 | 0.6 | 0.3 |
42185-26 | 2005 | 0.8 | 0.6 | 0.3 | 0.7 | 0.8 | 0.4 | 0.7 | 0.2 | 0.1 | 0.6 | 0.2 | 0.4 | 0.8 | 0.1 | 0.4 | 0.5 | 1.0 | 0.4 | 0.5 |
42185-26 | 2009 | 0.6 | 0.7 | 0.9 | 0.4 | 0.5 | 0.1 | 0.9 | 0.7 | 0.9 | 0.6 | 0.6 | 0.7 | 0.3 | 0.5 | 0.9 | 0.1 | 1.0 | 0.7 | 0.7 |
Across the columns are two headers, firm identifiers (e.g., 1001, 1003) and year identifiers for each firm (e.g., 2005 to 2014 for firm=1001 and 2010 to 2018 for firm=1003). Across the rows, I have identifiers for individuals and an event that occurs in a given year for that individual ("event_year"). What I'd like to do is transpose the data into the following format:
DATA want;
input individual & $char10. event_year firm year score;
length individual $10;
DATALINES;
103307-86 2010 1001 2005 0.3
103307-86 2010 1001 2006 0.0
103307-86 2010 1001 2007 1.0
103307-86 2010 1001 2008 0.6
103307-86 2010 1001 2009 0.0
103307-86 2010 1001 2010 0.0
103307-86 2010 1001 2011 0.9
103307-86 2010 1001 2012 0.8
103307-86 2010 1001 2013 0.8
103307-86 2010 1001 2014 0.0
103307-86 2010 1003 2010 1.0
103307-86 2010 1003 2011 0.7
103307-86 2010 1003 2012 0.1
103307-86 2010 1003 2013 0.1
103307-86 2010 1003 2014 0.7
103307-86 2010 1003 2015 0.4
103307-86 2010 1003 2016 0.3
103307-86 2010 1003 2017 0.6
103307-86 2010 1003 2018 0.3
42185-26 2005 1001 2005 0.8
42185-26 2005 1001 2006 0.6
42185-26 2005 1001 2007 0.3
42185-26 2005 1001 2008 0.7
42185-26 2005 1001 2009 0.8
42185-26 2005 1001 2010 0.4
42185-26 2005 1001 2011 0.7
42185-26 2005 1001 2012 0.2
42185-26 2005 1001 2013 0.1
42185-26 2005 1001 2014 0.6
42185-26 2005 1003 2010 0.2
42185-26 2005 1003 2011 0.4
42185-26 2005 1003 2012 0.8
42185-26 2005 1003 2013 0.1
42185-26 2005 1003 2014 0.4
42185-26 2005 1003 2015 0.5
42185-26 2005 1003 2016 1.0
42185-26 2005 1003 2017 0.4
42185-26 2005 1003 2018 0.5
42185-26 2009 1001 2005 0.6
42185-26 2009 1001 2006 0.7
42185-26 2009 1001 2007 0.9
42185-26 2009 1001 2008 0.4
42185-26 2009 1001 2009 0.5
42185-26 2009 1001 2010 0.1
42185-26 2009 1001 2011 0.9
42185-26 2009 1001 2012 0.7
42185-26 2009 1001 2013 0.9
42185-26 2009 1001 2014 0.6
42185-26 2009 1003 2010 0.6
42185-26 2009 1003 2011 0.7
42185-26 2009 1003 2012 0.3
42185-26 2009 1003 2013 0.5
42185-26 2009 1003 2014 0.9
42185-26 2009 1003 2015 0.1
42185-26 2009 1003 2016 1.0
42185-26 2009 1003 2017 0.7
42185-26 2009 1003 2018 0.7
;
RUN;
As you can see, I want to transpose the data into "vertical" format, where each row corresponds to a individual, event_year, firm, and year observation. I've created a variable called "score" that contains the value in the matrix. Thank you.
One simple way to solve the problem is as follows:
There could be more elegant methods but the approach used here is simple and basic - comprehensible with a basic knowledge of SAS.
/*Convert your source data into a tab separated file called your_datafile.
Give the full path and file name.
You can also use a csv file.
I have not used column headers. This keeps it simple.
Then import and create a dataset */
proc import datafile = your_datafile dbms=TAB out=A replace;
getnames=no;
run;
/*Separate the first two rows into a separate data a dataset*/
data b;
set a ;
if _n_<=2;
run;
/* Separate the next three rows into a separate dataset*/
data c;
set a;
if _n_ >2;
run;
/* transpose the first dataset from horizontal to vertical*/
proc transpose data=b out=bb(RENAME=(COL1=FIRM COL2=YEAR));
Var var2-Var21;
run;
/* transpose the second dataset from horizontal to vertical*/
proc transpose data=c out=cc(RENAME=(VAR1=INDIVIDUAL VAR2=EVENT_YEAR));
by Var1 Var2;
Var var3-Var21;
run;
/*sort them */
proc sort data=bb;
by _NAME_;
run;
proc sort data=cc;
by _NAME_;
run;
/* Get the data in the desired format into a dataset called desired
by merging bb and cc.
I am ordering the columns as shown in your question.
*/
data desired(drop=_NAME_);
retain INDIVIDUAl EVENT_YEAR FIRM YEAR SCORE);
merge cc bb;
by _NAME_;
run;
proc sort data=desired;
by INDIVIDUAL EVENT_YEAR;
run;
Everything in the code is simple and self explanatory.
One simple way to solve the problem is as follows:
There could be more elegant methods but the approach used here is simple and basic - comprehensible with a basic knowledge of SAS.
/*Convert your source data into a tab separated file called your_datafile.
Give the full path and file name.
You can also use a csv file.
I have not used column headers. This keeps it simple.
Then import and create a dataset */
proc import datafile = your_datafile dbms=TAB out=A replace;
getnames=no;
run;
/*Separate the first two rows into a separate data a dataset*/
data b;
set a ;
if _n_<=2;
run;
/* Separate the next three rows into a separate dataset*/
data c;
set a;
if _n_ >2;
run;
/* transpose the first dataset from horizontal to vertical*/
proc transpose data=b out=bb(RENAME=(COL1=FIRM COL2=YEAR));
Var var2-Var21;
run;
/* transpose the second dataset from horizontal to vertical*/
proc transpose data=c out=cc(RENAME=(VAR1=INDIVIDUAL VAR2=EVENT_YEAR));
by Var1 Var2;
Var var3-Var21;
run;
/*sort them */
proc sort data=bb;
by _NAME_;
run;
proc sort data=cc;
by _NAME_;
run;
/* Get the data in the desired format into a dataset called desired
by merging bb and cc.
I am ordering the columns as shown in your question.
*/
data desired(drop=_NAME_);
retain INDIVIDUAl EVENT_YEAR FIRM YEAR SCORE);
merge cc bb;
by _NAME_;
run;
proc sort data=desired;
by INDIVIDUAL EVENT_YEAR;
run;
Everything in the code is simple and self explanatory.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.