BookmarkSubscribeRSS Feed
jcis7
Pyrite | Level 9

G'day!

 

I hope you can help!

 

I have the following sample data in this structure and format (but lots more spanning more years):

 

 

 


DATA WORK.TEST_0000;
    LENGTH
        LocID              8
        Number             8
        Pct_HpA            8
        Pct_Bh             8
        Pct_Menin          8
        Pct_Rot            8
        Pct_Ap             8
        yr                 8 ;
    FORMAT
        LocID            BEST12.
        Number           BEST12.
        Pct_HpA          BEST12.
        Pct_Bh           BEST12.
        Pct_Menin        BEST12.
        Pct_Rot          BEST12.
        Pct_Ap           BEST12.
        yr               BEST12. ;
    INFORMAT
        LocID            BEST12.
        Number           BEST12.
        Pct_HpA          BEST12.
        Pct_Bh           BEST12.
        Pct_Menin        BEST12.
        Pct_Rot          BEST12.
        Pct_Ap           BEST12.
        yr               BEST12. ;
    INFILE DATALINES4
        DLM='7F'x
        MISSOVER
        DSD ;
    INPUT
        LocID            : BEST32.
        Number           : BEST32.
        Pct_HpA          : BEST32.
        Pct_Bh           : BEST32.
        Pct_Menin        : BEST32.
        Pct_Rot          : BEST32.
        Pct_Ap           : BEST32.
        yr               : BEST32. ;
DATALINES4;
12187689.494.794.396.698.220082009
2213594.810099.299.310020082009
3244990.596.395.598.198.820082009
4475489.194.693.997.498.220082009
5252981.787.289.395.495.820082009
12155687.192.193.394.195.720092010
2214289.594.793.99494.720092010
3240893.498.697.499.310020092010
447869195.894.797.298.420092010
5252785.691.690.897.398.320092010
12170888.394.393.595.397.220102011
2213586.591.790.99191.720102011
3241988.794.391.495.998.320102011
4482590.495.894.297.298.220102011
5257584.691.688.194.997.620102011
;;;;

 

 

 

 

 

And I need it to look like this with:

1.  the variables starting with 'Pct' in a new column variable called 'Names'

2.  the associated values of the  Pct_HpA, Pct_Bh, etc in a new column called 'Pcts'

3.  and their associated values per year,  LocID, and Number listed per line as below:

LocID

Number

Names

Pcts

Yr

1

21876

Pct_HpA

89.4

20082009

1

21876

Pct_Bh

94.7

20082009

1

21876

Pct_Menin

94.3

20082009

1

21876

Pct_Rot

96.6

20082009

1

21876

Pct_Ap

98.2

20082009

2

2135

Pct_HpA

94.8

20082009

2

2135

Pct_Bh

100

20082009

2

2135

Pct_Menin

99.2

20082009

2

2135

Pct_Rot

99.3

20082009

2

2135

Pct_Ap

100

20082009

3

2449

Pct_HpA

89.1

20082009

3

2449

Pct_Bh

94.6

20082009

3

2449

Pct_Menin

93.9

20082009

3

2449

Pct_Rot

97.4

20082009

3

2449

Pct_Ap

98.2

20082009

4

4754

Pct_HpA

89.1

20082009

4

4754

Pct_Bh

94.6

20082009

4

4754

Pct_Menin

93.9

20082009

4

4754

Pct_Rot

97.4

20082009

4

4754

Pct_Ap

98.2

20082009

5

2529

Pct_HpA

81.7

20082009

5

2529

Pct_Bh

87.2

20082009

5

2529

Pct_Menin

89.3

20082009

5

2529

Pct_Rot

95.4

20082009

5

2529

Pct_Ap

95.8

20082009

1

21556

Pct_HpA

87.1

20092010

1

21556

Pct_Bh

92.1

20092010

1

21556

Pct_Menin

93.3

20092010

1

21556

Pct_Rot

94.1

20092010

1

21556

Pct_Ap

95.7

20092010

2

2142

Pct_HpA

89.5

20092010

2

2142

Pct_Bh

94.7

20092010

2

2142

Pct_Menin

93.9

20092010

2

2142

Pct_Rot

94

20092010

2

2142

Pct_Ap

94.7

20092010

3

2408

Pct_HpA

93.4

20092010

3

2408

Pct_Bh

98.6

20092010

3

2408

Pct_Menin

97.4

20092010

3

2408

Pct_Rot

99.3

20092010

3

2408

Pct_Ap

100

20092010

4

4786

Pct_HpA

91

20092010

4

4786

Pct_Bh

95.8

20092010

4

4786

Pct_Menin

94.7

20092010

4

4786

Pct_Rot

97.2

20092010

4

4786

Pct_Ap

98.4

20092010

5

2527

Pct_HpA

85.6

20092010

5

2527

Pct_Bh

91.6

20092010

5

2527

Pct_Menin

90.8

20092010

5

2527

Pct_Rot

97.3

20092010

5

2527

Pct_Ap

98.3

20092010

1

21708

Pct_HpA

88.3

20102011

1

21708

Pct_Bh

94.3

20102011

1

21708

Pct_Menin

93.5

20102011

1

21708

Pct_Rot

95.3

20102011

1

21708

Pct_Ap

97.2

20102011

2

2135

Pct_HpA

86.5

20102011

2

2135

Pct_Bh

91.7

20102011

2

2135

Pct_Menin

90.9

20102011

2

2135

Pct_Rot

91

20102011

2

2135

Pct_Ap

91.7

20102011

3

2419

Pct_HpA

88.7

20102011

3

2419

Pct_Bh

94.3

20102011

3

2419

Pct_Menin

91.4

20102011

3

2419

Pct_Rot

95.9

20102011

3

2419

Pct_Ap

98.3

20102011

4

4825

Pct_HpA

90.4

20102011

4

4825

Pct_Bh

95.8

20102011

4

4825

Pct_Menin

94.2

20102011

4

4825

Pct_Rot

97.2

20102011

4

4825

Pct_Ap

98.2

20102011

5

2575

Pct_HpA

84.6

20102011

5

2575

Pct_Bh

91.6

20102011

5

2575

Pct_Menin

88.1

20102011

5

2575

Pct_Rot

94.9

20102011

5

2575

Pct_Ap

97.6

20102011

  

 

I'm not getting the results I'm needing using:  https://stats.idre.ucla.edu/sas/modules/how-to-reshape-data-wide-to-long-using-proc-transpose/ 

 

Any help you can give will be so very much appreciated!  Thank you!

8 REPLIES 8
Reeza
Super User

Post the code you tried, proc transpose is the correct approach here.

 

Spoiler

@jcis7 wrote:

G'day!

 

I hope you can help!

 

I have the following sample data in this structure and format (but lots more spanning more years):

 

 

 


DATA WORK.TEST_0000;
    LENGTH
        LocID              8
        Number             8
        Pct_HpA            8
        Pct_Bh             8
        Pct_Menin          8
        Pct_Rot            8
        Pct_Ap             8
        yr                 8 ;
    FORMAT
        LocID            BEST12.
        Number           BEST12.
        Pct_HpA          BEST12.
        Pct_Bh           BEST12.
        Pct_Menin        BEST12.
        Pct_Rot          BEST12.
        Pct_Ap           BEST12.
        yr               BEST12. ;
    INFORMAT
        LocID            BEST12.
        Number           BEST12.
        Pct_HpA          BEST12.
        Pct_Bh           BEST12.
        Pct_Menin        BEST12.
        Pct_Rot          BEST12.
        Pct_Ap           BEST12.
        yr               BEST12. ;
    INFILE DATALINES4
        DLM='7F'x
        MISSOVER
        DSD ;
    INPUT
        LocID            : BEST32.
        Number           : BEST32.
        Pct_HpA          : BEST32.
        Pct_Bh           : BEST32.
        Pct_Menin        : BEST32.
        Pct_Rot          : BEST32.
        Pct_Ap           : BEST32.
        yr               : BEST32. ;
DATALINES4;
12187689.494.794.396.698.220082009
2213594.810099.299.310020082009
3244990.596.395.598.198.820082009
4475489.194.693.997.498.220082009
5252981.787.289.395.495.820082009
12155687.192.193.394.195.720092010
2214289.594.793.99494.720092010
3240893.498.697.499.310020092010
447869195.894.797.298.420092010
5252785.691.690.897.398.320092010
12170888.394.393.595.397.220102011
2213586.591.790.99191.720102011
3241988.794.391.495.998.320102011
4482590.495.894.297.298.220102011
5257584.691.688.194.997.620102011
;;;;

 

 

 

 

 

And I need it to look like this with:

1.  the variables starting with 'Pct' in a new column variable called 'Names'

2.  the associated values of the  Pct_HpA, Pct_Bh, etc in a new column called 'Pcts'

3.  and their associated values per year,  LocID, and Number listed per line as below:

LocID

Number

Names

Pcts

Yr

1

21876

Pct_HpA

89.4

20082009

1

21876

Pct_Bh

94.7

20082009

1

21876

Pct_Menin

94.3

20082009

1

21876

Pct_Rot

96.6

20082009

1

21876

Pct_Ap

98.2

20082009

2

2135

Pct_HpA

94.8

20082009

2

2135

Pct_Bh

100

20082009

2

2135

Pct_Menin

99.2

20082009

2

2135

Pct_Rot

99.3

20082009

2

2135

Pct_Ap

100

20082009

3

2449

Pct_HpA

89.1

20082009

3

2449

Pct_Bh

94.6

20082009

3

2449

Pct_Menin

93.9

20082009

3

2449

Pct_Rot

97.4

20082009

3

2449

Pct_Ap

98.2

20082009

4

4754

Pct_HpA

89.1

20082009

4

4754

Pct_Bh

94.6

20082009

4

4754

Pct_Menin

93.9

20082009

4

4754

Pct_Rot

97.4

20082009

4

4754

Pct_Ap

98.2

20082009

5

2529

Pct_HpA

81.7

20082009

5

2529

Pct_Bh

87.2

20082009

5

2529

Pct_Menin

89.3

20082009

5

2529

Pct_Rot

95.4

20082009

5

2529

Pct_Ap

95.8

20082009

1

21556

Pct_HpA

87.1

20092010

1

21556

Pct_Bh

92.1

20092010

1

21556

Pct_Menin

93.3

20092010

1

21556

Pct_Rot

94.1

20092010

1

21556

Pct_Ap

95.7

20092010

2

2142

Pct_HpA

89.5

20092010

2

2142

Pct_Bh

94.7

20092010

2

2142

Pct_Menin

93.9

20092010

2

2142

Pct_Rot

94

20092010

2

2142

Pct_Ap

94.7

20092010

3

2408

Pct_HpA

93.4

20092010

3

2408

Pct_Bh

98.6

20092010

3

2408

Pct_Menin

97.4

20092010

3

2408

Pct_Rot

99.3

20092010

3

2408

Pct_Ap

100

20092010

4

4786

Pct_HpA

91

20092010

4

4786

Pct_Bh

95.8

20092010

4

4786

Pct_Menin

94.7

20092010

4

4786

Pct_Rot

97.2

20092010

4

4786

Pct_Ap

98.4

20092010

5

2527

Pct_HpA

85.6

20092010

5

2527

Pct_Bh

91.6

20092010

5

2527

Pct_Menin

90.8

20092010

5

2527

Pct_Rot

97.3

20092010

5

2527

Pct_Ap

98.3

20092010

1

21708

Pct_HpA

88.3

20102011

1

21708

Pct_Bh

94.3

20102011

1

21708

Pct_Menin

93.5

20102011

1

21708

Pct_Rot

95.3

20102011

1

21708

Pct_Ap

97.2

20102011

2

2135

Pct_HpA

86.5

20102011

2

2135

Pct_Bh

91.7

20102011

2

2135

Pct_Menin

90.9

20102011

2

2135

Pct_Rot

91

20102011

2

2135

Pct_Ap

91.7

20102011

3

2419

Pct_HpA

88.7

20102011

3

2419

Pct_Bh

94.3

20102011

3

2419

Pct_Menin

91.4

20102011

3

2419

Pct_Rot

95.9

20102011

3

2419

Pct_Ap

98.3

20102011

4

4825

Pct_HpA

90.4

20102011

4

4825

Pct_Bh

95.8

20102011

4

4825

Pct_Menin

94.2

20102011

4

4825

Pct_Rot

97.2

20102011

4

4825

Pct_Ap

98.2

20102011

5

2575

Pct_HpA

84.6

20102011

5

2575

Pct_Bh

91.6

20102011

5

2575

Pct_Menin

88.1

20102011

5

2575

Pct_Rot

94.9

20102011

5

2575

Pct_Ap

97.6

20102011

  

 

I'm not getting the results I'm needing using:  https://stats.idre.ucla.edu/sas/modules/how-to-reshape-data-wide-to-long-using-proc-transpose/ 

 

Any help you can give will be so very much appreciated!  Thank you!


ballardw
Super User

I am going to guess that your order for specifying the variables in your by statement had issues.

Since you show starting data that is not "sorted" by LOCID, then you cannot start a "by locid" in proc transpose.

 

So consider

 by notsorted locid number yr ; 

to get all the values you need on all of the output records.

The Name= option on proc transpose will set the name of the variable that has the original variables that transposed and Prefix sets the start of the name for the transposed values (hint: will get at least a 1 suffix)

and you can RENAME variables on any output data set where there is something like Out=mydatset (rename=(thisvar = thatvar))

jcis7
Pyrite | Level 9

Got It : - ) !!!  Do you have a good link to spell out the notsorted option?  

 

 

proc transpose data=TEST_0000 out=long3a;

  by notsorted locid number yr;

run;

 

data long4;

  set long3a (rename=(COL1=Pcts _NAME_=Names));

  label Names='Names';

run;


Thank you so very much : - )!  You all are so awesome and helpful!!!

Reeza
Super User

https://documentation.sas.com/?docsetId=lestmtsref&docsetTarget=p0yeyftk8ftuckn1o5qzy53284gz.htm&doc...

 

NOTSORTED

specifies that observations with the same BY value are grouped together but are not necessarily sorted in alphabetical or numeric order.

Restriction You cannot use the NOTSORTED option with the MERGE and UPDATE statements.
Tips The NOTSORTED option can appear anywhere in the BY statement.
Using the NOTSORTED option is useful if you have data that falls into other logical groupings such as chronological order or categories.
BY-Group Processing with Nonsorted Data
art297
Opal | Level 21

I think a little extra code will get you even closer to what you want. i.e.,

proc transpose data=TEST_0000 out=long3a(rename=(COL1=Pcts)) name=Names;
  by notsorted locid number yr;
run;

data long4;
  set long3a;
  label Names='Names';
  Names=substr(Names,5);
run;

However, you will get an undesirable result if any adjacent records share the same values in their by values. e.g., the following input dataset won't result (i don't think) what you expect to achieve:

DATA WORK.TEST_0000;
    LENGTH
        LocID              8
        Number             8
        Pct_HpA            8
        Pct_Bh             8
        Pct_Menin          8
        Pct_Rot            8
        Pct_Ap             8
        yr                 8 ;
    FORMAT
        LocID            BEST12.
        Number           BEST12.
        Pct_HpA          BEST12.
        Pct_Bh           BEST12.
        Pct_Menin        BEST12.
        Pct_Rot          BEST12.
        Pct_Ap           BEST12.
        yr               BEST12. ;
    INFORMAT
        LocID            BEST12.
        Number           BEST12.
        Pct_HpA          BEST12.
        Pct_Bh           BEST12.
        Pct_Menin        BEST12.
        Pct_Rot          BEST12.
        Pct_Ap           BEST12.
        yr               BEST12. ;
    INFILE DATALINES4
        DLM='7F'x
        MISSOVER
        DSD ;
    INPUT
        LocID            : BEST32.
        Number           : BEST32.
        Pct_HpA          : BEST32.
        Pct_Bh           : BEST32.
        Pct_Menin        : BEST32.
        Pct_Rot          : BEST32.
        Pct_Ap           : BEST32.
        yr               : BEST32. ;
DATALINES4;
12187689.494.794.396.698.220082009
121876289.4294.7294.3296.6298.220082009
2213594.810099.299.310020082009
3244990.596.395.598.198.820082009
4475489.194.693.997.498.220082009
5252981.787.289.395.495.820082009
12155687.192.193.394.195.720092010
2214289.594.793.99494.720092010
3240893.498.697.499.310020092010
447869195.894.797.298.420092010
5252785.691.690.897.398.320092010
12170888.394.393.595.397.220102011
2213586.591.790.99191.720102011
3241988.794.391.495.998.320102011
4482590.495.894.297.298.220102011
5257584.691.688.194.997.620102011
;;;;
run;

However, the %untranspose macro will (I think) give you the desired result in both cases) and not need an extra data step. e.g.:

filename ut url 'http://tiny.cc/untranspose_macro';
%include ut ;
%untranspose(data=TEST_0000,
  out=want(rename=(_name_=names _value_=pct)),
  by=locid number yr,
  prefix=Pct_,
  var=HpA Bh Menin Rot Ap)

Art, CEO, AnalystFinder.com

 

 

RichardDeVen
Barite | Level 11

You are transposing BY row where the source data may NOT be ordered according BY variables that are surrogates for uniquely representing a row.  If the BY values are disordered, or repeated a simple BY statement will not work.

 

For the case of row-by-row transposing of uncertain data, I would recommend creating a VIEW that has an additional computed column ROWNUM that will ensure a distinct identification of a row when prefaced to your original BY group.  The VIEW can be deleted after the TRANSPOSE.

 

Example:

data have;
INPUT LocID Number Pct_HpA Pct_Bh Pct_Menin Pct_Rot Pct_Ap yr;
DATALINES;
1 21876 89.4 94.7 94.3 96.6 98.2 20082009
2 2135 94.8 100 99.2 99.3 100 20082009
3 2449 90.5 96.3 95.5 98.1 98.8 20082009
4 4754 89.1 94.6 93.9 97.4 98.2 20082009
5 2529 81.7 87.2 89.3 95.4 95.8 20082009
1 21556 87.1 92.1 93.3 94.1 95.7 20092010
2 2142 89.5 94.7 93.9 94 94.7 20092010
3 2408 93.4 98.6 97.4 99.3 100 20092010
4 4786 91 95.8 94.7 97.2 98.4 20092010
5 2527 85.6 91.6 90.8 97.3 98.3 20092010
;

data have_v / view=have_v;
  set have;
  rownum + 1;
run;

proc transpose 
  data=have_v 
  out=want ( rename=(_name_=Names col1=Pcts)
             drop=rownum
           )
;
  by rownum locid number yr;
  var Pct:;
  label Names=' ';
run;

proc delete data=have_v(mt=view);
run;

Result

RichardADeVenezia_0-1588911421163.png

 

 

 

 

jcis7
Pyrite | Level 9

Can you help me understand the use of the colon in the 3rd to last line in the full code below (var Pct:;)?

Thank you!

 

 

 


data have_v / view=have_v;
set have;
rownum + 1;
run;

proc transpose
data=have_v
out=want ( rename=(_name_=Names col1=Pcts)
drop=rownum
)
;
by rownum locid number yr;
var Pct:;
label Names=' ';
run;

RichardDeVen
Barite | Level 11

VAR PCT: ; is a statement that is using a SAS Variable List.  In this case PCT: means all variables whose name starts with PCT.

 

Learn more about Variable Lists in the documentation 

 

A SAS variable is an abbreviated method of referring to a list of variable names. SAS enables you to use the following variable lists:

  • numbered range lists
  • name range lists
  • name prefix lists
  • special SAS name lists

 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 892 views
  • 5 likes
  • 5 in conversation