BookmarkSubscribeRSS Feed

Transpose your analysis data with the %MAKELONG and %MAKEWIDE macro

Started ‎01-23-2022 by
Modified ‎04-01-2022 by
Views 5,001

This article introduces the macros %MAKEWIDE and %MAKELONG to transpose your data between different formats. The macros have been introduced with the SAS Press Book  Data Preparation for Analytics Using SAS and have been used in the SAS training class "Building Analytic Data Marts".

Both macros are based on the TRANSPOSE procedure. The benefit of using the macros (compared to the TRANSPOSE procedure) is

  • you can transpose more than one variable in one macro call
  • you can write shorter code, especially when moving from a WIDE to a LONG data structure 

 

What is a LONG or a WIDE dataset? Why should I transpose them?

 

Transposing data is a frequent task in data preparation. There are two main data structure in analytics and data science. 

  • The one-row-per-subject data structure, where all information about an analysis subject is stored in one record. This structure is also referred to as a WIDE structure.
  • The multiple-row-per-subject or longitudinal data structure, where the information about an analysis subject is stored in multiple records. These multiple records often exist because of repeated measurements over time, or because of hierarchical relationships with several sub entries. This structure is also referred to as a LONG structure.

 

For different types of analyses, the data is needed in different structures. Machine learning techniques often need the data in a one-row-per subject structure. Time Series Forecasting techniques or the plotting of information over time usually needs a multiple-row-per-subject structure.

 

Changing the data structure between LONG <--> WIDE is called TRANSPOSING.

 

gsvolba_1-1642976861967.png

 

The sample data

 

This article uses the DOGS dataset as sample data. The data have repeated measurements over time. Histamine and Heamoglobin data for 16 dogs at 4 different timepoints (0,1,3,5) are available. This data has been provided in both data structures, WIDE and LONG. 

 

The DOGS_WIDE dataset

 

                                                                             H       H       H       H
                                                                             e       e       e       e
                                             H       H       H       H       a       a       a       a
                                             i       i       i       i       m       m       m       m
                                      D      s       s       s       s       o       o       o       o
                                      e      t       t       t       t       g       g       g       g
                                      p      a       a       a       a       l       l       l       l
                 D                    l      m       m       m       m       o       o       o       o
                 o         D          e      i       i       i       i       b       b       b       b
                 g         r          t      n       n       n       n       i       i       i       i
                 I         u          e      e       e       e       e       n       n       n       n
                 D         g          d      0       1       3       5       0       1       3       5

                 1    Morphine        N    0.04    0.20    0.10    0.08    14.7    14.0    14.2    14.1
                 2    Morphine        N    0.02    0.06    0.02    0.02    14.4    14.5    14.2    14.2
                 3    Morphine        N    0.07    1.40    0.48    0.24    14.4    14.2    14.9    14.2
                 4    Morphine        N    0.17    0.57    0.35    0.24    15.0    14.9    14.3    14.3
                 5    Morphine        Y    0.10    0.09    0.13    0.14    14.5    14.7    14.0    14.2
                 6    Morphine        Y    0.12    0.11    0.10     .      14.4    14.5    14.9    15.0
                 7    Morphine        Y    0.07    0.07    0.06    0.07    14.3    14.5    14.0    14.1
                 8    Morphine        Y    0.05    0.07    0.06    0.07    14.3    14.1    14.7    14.2
                 9    Trimethaphan    N    0.03    0.62    0.31    0.22    14.1    14.0    14.1    14.4
                10    Trimethaphan    N    0.03    1.05    0.73    0.60    14.1    14.7    14.5    14.3
                11    Trimethaphan    N    0.07    0.83    1.07    0.80    14.6    15.0    14.2    14.0
                12    Trimethaphan    N    0.09    3.13    2.06    1.23    14.5    14.4    14.3    14.1
                13    Trimethaphan    Y    0.10    0.09    0.09    0.08    14.7    14.3    14.2    14.6
                14    Trimethaphan    Y    0.08    0.09    0.09    0.10    14.9    14.2    14.4    14.1
                15    Trimethaphan    Y    0.13    0.10    0.12    0.12    14.7    14.7    15.0    14.5
                16    Trimethaphan    Y    0.06    0.05    0.05    0.05    14.8    14.9    14.7    14.5

 

 The DOGS_LONG Dataset (only dogs 1-3 shown)

 

                         Dog
                          ID      Drug      Depleted    Histamine    Measurement    Heamoglobin

                          1     Morphine       N           0.04              0          14.7   
                          1     Morphine       N           0.20              1          14.0   
                          1     Morphine       N           0.10              3          14.2   
                          1     Morphine       N           0.08              5          14.1   
                          2     Morphine       N           0.02              0          14.4   
                          2     Morphine       N           0.06              1          14.5   
                          2     Morphine       N           0.02              3          14.2   
                          2     Morphine       N           0.02              5          14.2   
                          3     Morphine       N           0.07              0          14.4   
                          3     Morphine       N           1.40              1          14.2   
                          3     Morphine       N           0.48              3          14.9   
                          3     Morphine       N           0.24              5          14.2 

 

Transposing from LONG to WIDE 

 

Using the TRANSPOSE procedure

The following code shows how you can use the TRANSPOSE procedure to create a WIDE dataset based on the LONG dataset.

  

PROC TRANSPOSE DATA = dogs_long 
               PREFIX =histamine 
               OUT = dogs_wide_hist1;
   BY dogid drug depleted;
   VAR Histamine;
   ID Measurement;
RUN;

 

Note the following from the code: 

  • You use a BY statement for DogID to force transposing for each dog. As the static variables DRUG and DEPLETED are the same for each dog, you can transfer them to the output dataset by adding them to the BY statement. Note that the data must be sorted BY DogID.
  • The prefix for the new variables names HISTAMINE is specified in with the PREFIX option.
  • You use the VAR statement to specify the variable which contains the values that shall be transposed.

 

The output data (only 3 dogs are shown) looks as follows:

 

 Dog
  ID      Drug      Depleted     _NAME_      histamine0    histamine1    histamine3    histamine5

  1     Morphine       N        Histamine       0.04          0.20          0.10          0.08   
  2     Morphine       N        Histamine       0.02          0.06          0.02          0.02   
  3     Morphine       N        Histamine       0.07          1.40          0.48          0.24   

 

Using the %MAKEWIDE macro

The %MAKEWIDE macro facilitates transposing data from a LONG to a WIDE structure. If you only transpose one variable, there is not a big different to the PROC TRANSPOSE code. The %MAKEWIDE macro, however, can transpose more than one variable on one go.

 

The macro has the following parameters:

 

  • DATA and OUT: The names of the input and output data sets, respectively.
  • ID: The name of the ID variable that identifies the subject.
  • COPY: A list of variables that occur repeatedly with each observation for a subject and will be copied to the resulting data set. Note that the COPY variable(s) must not be used in the COPY statement of PROC TRANSPOSE for our purposes, but are listed in the BY statement after the ID variable. We assume here that COPY variables have the same values within one ID.
  • VAR: The variables that holds the values to be transposed. 
  • TIME: The variable that numerates the repeated measurements.

 

You use the following code to transpose both variables, HISTAMINE and HEAMOGLOBIN to a WIDE structure with the %MAKELONG macro. 

 

%MAKEWIDE(DATA=dogs_long,
          OUT=dogs_wide_both2,
          ID=dogid,
          COPY=drug depleted,
          VAR=heamoglobin Histamine,
          TIME=Measurement);

Technically the %MAKEWIDE macro scans the list of variables and performs a transpose for each variable and then merges the results together.

The output for 3 dogs is shown below, note that both variables, HISTAMINE and HEAMOGLOBIN, are now transposed.

 

 Dog
  ID   Drug   Depleted heamoglobin0 heamoglobin1 heamoglobin3 heamoglobin5 Histamine0 Histamine1 Histamine3 Histamine5

  1  Morphine    N         14.7         14.0         14.2         14.1        0.04       0.20       0.10       0.08   
  2  Morphine    N         14.4         14.5         14.2         14.2        0.02       0.06       0.02       0.02   
  3  Morphine    N         14.4         14.2         14.9         14.2        0.07       1.40       0.48       0.24

 

Transposing from WIDE to LONG

 

Using the TRANSPOSE procedure

 

You can also use the TRANSPOSE procedure to transpose the data from a WIDE to LONG structure.

 

 

proc transpose data=dogs_wide Name=_measure
               out=dogs_long_1var(rename=(col1=Histamine) where=(_measure contains "Histamine")) ;
 by dogid drug depleted;
run;

 

  • Again, a BY statement is used where you can copy variables, which do not differ by DOGID variables to the output dataset.
  • Note that the variable that contains the names the former variables that where transposed can be specified with the NAME option. The value _MEASURE is used here.
  • As both variables, HISTAMINE and HEAMOGLOBIN are transposed here and aligned in rows underneath each other, a WHERE clause is used to only show the HISTAMINE values. You might want to keep both variables in the output dataset, however then you need to treat them separately if you want to retrieve the measurement numbers and you cannot use the datastep as shown below.

After you run this code, you are almost done. You see the data in a LONG structure. However you need to extract the measurement values 0,1,3,5 manually from the _MEASURE values.

 

 

                                Dog
                                  ID      Drug      Depleted     _measure     Histamine

                                  1     Morphine       N        Histamine0       0.04  
                                  1     Morphine       N        Histamine1       0.20  
                                  1     Morphine       N        Histamine3       0.10  
                                  1     Morphine       N        Histamine5       0.08  
                                  2     Morphine       N        Histamine0       0.02  
                                  2     Morphine       N        Histamine1       0.06  
                                  2     Morphine       N        Histamine3       0.02  
                                  2     Morphine       N        Histamine5       0.02  
                                  3     Morphine       N        Histamine0       0.07  
                                  3     Morphine       N        Histamine1       1.40  
                                  3     Morphine       N        Histamine3       0.48  
                                  3     Morphine       N        Histamine5       0.24

 

You can use a datastep for this task. Here you use the TRANWRD function to replace the "HISTAMINE" string with BLANK and you use the INPUT function to convert the numbers 0,1,3,5 from character to numeric format.

 

 

data dogs_long_1var;
 set dogs_long_1var;
 format Measurement 8.;
 Measurement = input(tranwrd(_measure,"Histamine",""),$8.);
 drop _measure;
run;

 

Finally you end up, with the desired data structure.

 

 

                                Dog
                                 ID      Drug      Depleted    Histamine    Measurement

                                 1     Morphine       N           0.04              0  
                                 1     Morphine       N           0.20              1  
                                 1     Morphine       N           0.10              3  
                                 1     Morphine       N           0.08              5  
                                 2     Morphine       N           0.02              0  
                                 2     Morphine       N           0.06              1  
                                 2     Morphine       N           0.02              3  
                                 2     Morphine       N           0.02              5  
                                 3     Morphine       N           0.07              0  
                                 3     Morphine       N           1.40              1  
                                 3     Morphine       N           0.48              3  
                                 3     Morphine       N           0.24              5  

 

Using the %MAKELONG macro

 

The %MAKELONG macro performs the TRANSPOSE step and the datastep for you in one go, without having to code it extra. Also you can transpose more than one variable, as shown in this example.

 

 

%MAKELONG(DATA=dogs_wide,
          OUT=dogs_long_both2,
          ID=Dogid,
          COPY=drug depleted,
          ROOT=heamoglobin Histamine,
          MEASUREMENT=Measurement);

 

The macro transposed each variable separately and merges the output datasets.

 

 

                         Dog
                          ID      Drug      Depleted    heamoglobin    Measurement    Histamine

                          1     Morphine       N            14.7               0         0.04  
                          1     Morphine       N            14.0               1         0.20  
                          1     Morphine       N            14.2               3         0.10  
                          1     Morphine       N            14.1               5         0.08  
                          2     Morphine       N            14.4               0         0.02  
                          2     Morphine       N            14.5               1         0.06  
                          2     Morphine       N            14.2               3         0.02  
                          2     Morphine       N            14.2               5         0.02  
                          3     Morphine       N            14.4               0         0.07  
                          3     Morphine       N            14.2               1         1.40  
                          3     Morphine       N            14.9               3         0.48  
                          3     Morphine       N            14.2               5         0.24  

 

Using the %UNTRANSPOSE macro

In 2018, Arthur S. Tabachneck, posted a paper on SAS Global Forum where he introduced a much more detailed version of a macro to perform a wide-to-long transposing. His work extends the %MAKEWIDE macro by providing more options and considering more special cases.

Some years ago one of the current paper’s authors realized this and offered the MAKELONG macro to the SAS community (Svolba, G. 2008, 2014). Dr. Svolba’s macro was definitely a step in the right direction, as it ran the necessary PROC TRANSPOSE and merge steps without requiring the user to do such things as redundantly type variable names, assign unique names for all of the temporary files that might have to be created, write code to extract ID values from the transposed variable names, or create a data step to merge the resulting files. Dr. Svolba’s macro did everything that was needed as long as the input and ID variables were all numeric, and the wide file’s transposed variable names didn’t include such strings as prefixes, delimiters, and suffixes.

 

Feature Engineering or Transposing?

 
In some cases it is important to have the original data just in another structure. Here, transposing is the correct task. In other cases you want to derive the content from the longitudinal data and describe the behavior of the analysis subjects, rather than having the original data. Here Feature Engineering (creating derived variables) is more important. Articles on feature engineering can be found here:
Comments

See also :

Switching Things Up With the TRANSPOSE Procedure
Posted 05-16-2021 02:34 PM | by krisp18
https://communities.sas.com/t5/SAS-Global-Forum-Proceedings/Switching-Things-Up-With-the-TRANSPOSE-P...

 

 

Also see my paper: Transpose data by MERGE 

 

http://support.sas.com/resources/papers/proceedings15/2785-2015.pdf

 

Version history
Last update:
‎04-01-2022 07:04 PM
Updated by:
Contributors

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Free course: Data Literacy Essentials

Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning  and boost your career prospects.

Get Started

Article Tags