Transpose your analysis data with the %MAKELONG and %MAKEWIDE macro
- Article History
- RSS Feed
- Mark as New
- Mark as Read
- Bookmark
- Subscribe
- Printer Friendly Page
- Report Inappropriate Content
This article introduces the macros %MAKEWIDE and %MAKELONG to transpose your data between different formats. The macros have been introduced with the SAS Press Book Data Preparation for Analytics Using SAS and have been used in the SAS training class "Building Analytic Data Marts".
Both macros are based on the TRANSPOSE procedure. The benefit of using the macros (compared to the TRANSPOSE procedure) is
- you can transpose more than one variable in one macro call
- you can write shorter code, especially when moving from a WIDE to a LONG data structure
What is a LONG or a WIDE dataset? Why should I transpose them?
Transposing data is a frequent task in data preparation. There are two main data structure in analytics and data science.
- The one-row-per-subject data structure, where all information about an analysis subject is stored in one record. This structure is also referred to as a WIDE structure.
- The multiple-row-per-subject or longitudinal data structure, where the information about an analysis subject is stored in multiple records. These multiple records often exist because of repeated measurements over time, or because of hierarchical relationships with several sub entries. This structure is also referred to as a LONG structure.
For different types of analyses, the data is needed in different structures. Machine learning techniques often need the data in a one-row-per subject structure. Time Series Forecasting techniques or the plotting of information over time usually needs a multiple-row-per-subject structure.
Changing the data structure between LONG <--> WIDE is called TRANSPOSING.
The sample data
This article uses the DOGS dataset as sample data. The data have repeated measurements over time. Histamine and Heamoglobin data for 16 dogs at 4 different timepoints (0,1,3,5) are available. This data has been provided in both data structures, WIDE and LONG.
The DOGS_WIDE dataset
H H H H
e e e e
H H H H a a a a
i i i i m m m m
D s s s s o o o o
e t t t t g g g g
p a a a a l l l l
D l m m m m o o o o
o D e i i i i b b b b
g r t n n n n i i i i
I u e e e e e n n n n
D g d 0 1 3 5 0 1 3 5
1 Morphine N 0.04 0.20 0.10 0.08 14.7 14.0 14.2 14.1
2 Morphine N 0.02 0.06 0.02 0.02 14.4 14.5 14.2 14.2
3 Morphine N 0.07 1.40 0.48 0.24 14.4 14.2 14.9 14.2
4 Morphine N 0.17 0.57 0.35 0.24 15.0 14.9 14.3 14.3
5 Morphine Y 0.10 0.09 0.13 0.14 14.5 14.7 14.0 14.2
6 Morphine Y 0.12 0.11 0.10 . 14.4 14.5 14.9 15.0
7 Morphine Y 0.07 0.07 0.06 0.07 14.3 14.5 14.0 14.1
8 Morphine Y 0.05 0.07 0.06 0.07 14.3 14.1 14.7 14.2
9 Trimethaphan N 0.03 0.62 0.31 0.22 14.1 14.0 14.1 14.4
10 Trimethaphan N 0.03 1.05 0.73 0.60 14.1 14.7 14.5 14.3
11 Trimethaphan N 0.07 0.83 1.07 0.80 14.6 15.0 14.2 14.0
12 Trimethaphan N 0.09 3.13 2.06 1.23 14.5 14.4 14.3 14.1
13 Trimethaphan Y 0.10 0.09 0.09 0.08 14.7 14.3 14.2 14.6
14 Trimethaphan Y 0.08 0.09 0.09 0.10 14.9 14.2 14.4 14.1
15 Trimethaphan Y 0.13 0.10 0.12 0.12 14.7 14.7 15.0 14.5
16 Trimethaphan Y 0.06 0.05 0.05 0.05 14.8 14.9 14.7 14.5
The DOGS_LONG Dataset (only dogs 1-3 shown)
Dog
ID Drug Depleted Histamine Measurement Heamoglobin
1 Morphine N 0.04 0 14.7
1 Morphine N 0.20 1 14.0
1 Morphine N 0.10 3 14.2
1 Morphine N 0.08 5 14.1
2 Morphine N 0.02 0 14.4
2 Morphine N 0.06 1 14.5
2 Morphine N 0.02 3 14.2
2 Morphine N 0.02 5 14.2
3 Morphine N 0.07 0 14.4
3 Morphine N 1.40 1 14.2
3 Morphine N 0.48 3 14.9
3 Morphine N 0.24 5 14.2
Transposing from LONG to WIDE
Using the TRANSPOSE procedure
The following code shows how you can use the TRANSPOSE procedure to create a WIDE dataset based on the LONG dataset.
PROC TRANSPOSE DATA = dogs_long
PREFIX =histamine
OUT = dogs_wide_hist1;
BY dogid drug depleted;
VAR Histamine;
ID Measurement;
RUN;
Note the following from the code:
- You use a BY statement for DogID to force transposing for each dog. As the static variables DRUG and DEPLETED are the same for each dog, you can transfer them to the output dataset by adding them to the BY statement. Note that the data must be sorted BY DogID.
- The prefix for the new variables names HISTAMINE is specified in with the PREFIX option.
- You use the VAR statement to specify the variable which contains the values that shall be transposed.
The output data (only 3 dogs are shown) looks as follows:
Dog
ID Drug Depleted _NAME_ histamine0 histamine1 histamine3 histamine5
1 Morphine N Histamine 0.04 0.20 0.10 0.08
2 Morphine N Histamine 0.02 0.06 0.02 0.02
3 Morphine N Histamine 0.07 1.40 0.48 0.24
Using the %MAKEWIDE macro
The %MAKEWIDE macro facilitates transposing data from a LONG to a WIDE structure. If you only transpose one variable, there is not a big different to the PROC TRANSPOSE code. The %MAKEWIDE macro, however, can transpose more than one variable on one go.
The macro has the following parameters:
- DATA and OUT: The names of the input and output data sets, respectively.
- ID: The name of the ID variable that identifies the subject.
- COPY: A list of variables that occur repeatedly with each observation for a subject and will be copied to the resulting data set. Note that the COPY variable(s) must not be used in the COPY statement of PROC TRANSPOSE for our purposes, but are listed in the BY statement after the ID variable. We assume here that COPY variables have the same values within one ID.
- VAR: The variables that holds the values to be transposed.
- TIME: The variable that numerates the repeated measurements.
You use the following code to transpose both variables, HISTAMINE and HEAMOGLOBIN to a WIDE structure with the %MAKELONG macro.
%MAKEWIDE(DATA=dogs_long,
OUT=dogs_wide_both2,
ID=dogid,
COPY=drug depleted,
VAR=heamoglobin Histamine,
TIME=Measurement);
Technically the %MAKEWIDE macro scans the list of variables and performs a transpose for each variable and then merges the results together.
The output for 3 dogs is shown below, note that both variables, HISTAMINE and HEAMOGLOBIN, are now transposed.
Dog
ID Drug Depleted heamoglobin0 heamoglobin1 heamoglobin3 heamoglobin5 Histamine0 Histamine1 Histamine3 Histamine5
1 Morphine N 14.7 14.0 14.2 14.1 0.04 0.20 0.10 0.08
2 Morphine N 14.4 14.5 14.2 14.2 0.02 0.06 0.02 0.02
3 Morphine N 14.4 14.2 14.9 14.2 0.07 1.40 0.48 0.24
Transposing from WIDE to LONG
Using the TRANSPOSE procedure
You can also use the TRANSPOSE procedure to transpose the data from a WIDE to LONG structure.
proc transpose data=dogs_wide Name=_measure
out=dogs_long_1var(rename=(col1=Histamine) where=(_measure contains "Histamine")) ;
by dogid drug depleted;
run;
- Again, a BY statement is used where you can copy variables, which do not differ by DOGID variables to the output dataset.
- Note that the variable that contains the names the former variables that where transposed can be specified with the NAME option. The value _MEASURE is used here.
- As both variables, HISTAMINE and HEAMOGLOBIN are transposed here and aligned in rows underneath each other, a WHERE clause is used to only show the HISTAMINE values. You might want to keep both variables in the output dataset, however then you need to treat them separately if you want to retrieve the measurement numbers and you cannot use the datastep as shown below.
After you run this code, you are almost done. You see the data in a LONG structure. However you need to extract the measurement values 0,1,3,5 manually from the _MEASURE values.
Dog
ID Drug Depleted _measure Histamine
1 Morphine N Histamine0 0.04
1 Morphine N Histamine1 0.20
1 Morphine N Histamine3 0.10
1 Morphine N Histamine5 0.08
2 Morphine N Histamine0 0.02
2 Morphine N Histamine1 0.06
2 Morphine N Histamine3 0.02
2 Morphine N Histamine5 0.02
3 Morphine N Histamine0 0.07
3 Morphine N Histamine1 1.40
3 Morphine N Histamine3 0.48
3 Morphine N Histamine5 0.24
You can use a datastep for this task. Here you use the TRANWRD function to replace the "HISTAMINE" string with BLANK and you use the INPUT function to convert the numbers 0,1,3,5 from character to numeric format.
data dogs_long_1var;
set dogs_long_1var;
format Measurement 8.;
Measurement = input(tranwrd(_measure,"Histamine",""),$8.);
drop _measure;
run;
Finally you end up, with the desired data structure.
Dog
ID Drug Depleted Histamine Measurement
1 Morphine N 0.04 0
1 Morphine N 0.20 1
1 Morphine N 0.10 3
1 Morphine N 0.08 5
2 Morphine N 0.02 0
2 Morphine N 0.06 1
2 Morphine N 0.02 3
2 Morphine N 0.02 5
3 Morphine N 0.07 0
3 Morphine N 1.40 1
3 Morphine N 0.48 3
3 Morphine N 0.24 5
Using the %MAKELONG macro
The %MAKELONG macro performs the TRANSPOSE step and the datastep for you in one go, without having to code it extra. Also you can transpose more than one variable, as shown in this example.
%MAKELONG(DATA=dogs_wide,
OUT=dogs_long_both2,
ID=Dogid,
COPY=drug depleted,
ROOT=heamoglobin Histamine,
MEASUREMENT=Measurement);
The macro transposed each variable separately and merges the output datasets.
Dog
ID Drug Depleted heamoglobin Measurement Histamine
1 Morphine N 14.7 0 0.04
1 Morphine N 14.0 1 0.20
1 Morphine N 14.2 3 0.10
1 Morphine N 14.1 5 0.08
2 Morphine N 14.4 0 0.02
2 Morphine N 14.5 1 0.06
2 Morphine N 14.2 3 0.02
2 Morphine N 14.2 5 0.02
3 Morphine N 14.4 0 0.07
3 Morphine N 14.2 1 1.40
3 Morphine N 14.9 3 0.48
3 Morphine N 14.2 5 0.24
Using the %UNTRANSPOSE macro
In 2018, Arthur S. Tabachneck, posted a paper on SAS Global Forum where he introduced a much more detailed version of a macro to perform a wide-to-long transposing. His work extends the %MAKEWIDE macro by providing more options and considering more special cases.
Some years ago one of the current paper’s authors realized this and offered the MAKELONG macro to the SAS community (Svolba, G. 2008, 2014). Dr. Svolba’s macro was definitely a step in the right direction, as it ran the necessary PROC TRANSPOSE and merge steps without requiring the user to do such things as redundantly type variable names, assign unique names for all of the temporary files that might have to be created, write code to extract ID values from the transposed variable names, or create a data step to merge the resulting files. Dr. Svolba’s macro did everything that was needed as long as the input and ID variables were all numeric, and the wide file’s transposed variable names didn’t include such strings as prefixes, delimiters, and suffixes.
Feature Engineering or Transposing?
- Feature Engineering #1 - Using Correlation Analysis to Describe Behavior over Time
- Feature Engineering #3 – Describing the Trend over Time
- 3 ways to consider movable holidays in SAS
Further Reading
This example has been taken from my SAS Press book Data Preparation for Analytics Using SAS see chapter 14 and chapter 15
Further books of the author in SAS Press:
(48°13'01''N - 16°21'29''O)
- Mark as Read
- Mark as New
- Bookmark
- Permalink
- Report Inappropriate Content
See also :
Switching Things Up With the TRANSPOSE Procedure
Posted 05-16-2021 02:34 PM | by krisp18
https://communities.sas.com/t5/SAS-Global-Forum-Proceedings/Switching-Things-Up-With-the-TRANSPOSE-P...
- Mark as Read
- Mark as New
- Bookmark
- Permalink
- Report Inappropriate Content
Also see my paper: Transpose data by MERGE
http://support.sas.com/resources/papers/proceedings15/2785-2015.pdf