DATA Step, Macro, Functions and more

Long to wide conversion

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 7
Accepted Solution

Long to wide conversion

Dear SAS Users,

How can I achieve the following task in an efficient way? Let's say that I have a long data that have two sales figures for different firms and I would like to make it wide as below. Also please note that the original dataset is a large one.

Thank you in advance for any help.

Have

FirmDateSales1Sales2
A20/2/201210050
A21/2/201220070
A22/2/201230090
B21/2/2012120200
B22/2/201230050
B23/2/2012300200
C19/2/20125040
C20/2/20127550
C21/2/20128030
C22/2/201290100


Want

dateA1A2B1B2C1C2
19/2/20125040
20/2/2012100507550
21/2/2012200701202008030
22/2/2012300903005090100
23/2/2012300200

Accepted Solutions
Solution
‎02-22-2012 11:49 AM
PROC Star
Posts: 7,467

Long to wide conversion

There is probably a less convoluted way to do this, but I think that the following achieves the desired result:

data Have;

  informat date ddmmyy10.;

  format date date9.;

  input Firm $ Date Sales1 Sales2;

  cards;

A          20/2/2012          100          50

A          21/2/2012          200          70

A          22/2/2012          300          90

B          21/2/2012          120          200

B          22/2/2012          300          50

B          23/2/2012          300          200

C          19/2/2012          50          40

C          20/2/2012          75          50

C          21/2/2012          80          30

C          22/2/2012          90          100

;

proc sort data=have;

  by date firm;

run;

data fortranspose(keep = date _name_ sales) / view=fortranspose;

   set have;

   by date;

   array ss(2) sales1-sales2;

   if first.date then obsnum = 0;

   do i = 1 to 2;

      _name_ = trim(firm) || put(i,1.);

      sales = ss(i);

      output;

   end;

run;

proc transpose data=fortranspose out=want(drop=_name_);

   by date;

   var sales;

run;

data want;

  retain date a1 a2 b1 b2 c1 c2;

  set want;

run;

View solution in original post


All Replies
Super User
Posts: 5,495

Long to wide conversion

JohnK,

In theory, it's not so hard.  In practice, though, you mention that your data set is long.  I suspect that "Firm" is not really as simple as A, B, and C, but takes on many more complex values that don't easily translate into unique variable names.  For example, if your data contained both of these Firms:

George Washington Carver High School North

George Washington Carver High School South

I doubt that you could come up with a good way to assign what the new variable names would be.

If you were willing to come up with a modified solution, it might look look like this.  Instead of variable names like A1, A2, B1, B2, use names like Sales0001_1, Sales0001_2, Sales0002_1, Sales0002_2, etc.  The first 4 digits of the new variables indicate which Firm it is, and you would need a separate method to translate from the arbitrary firm number to the firm name.  (This would not be difficult using a format.)

Does this sound like an acceptable solution?

Solution
‎02-22-2012 11:49 AM
PROC Star
Posts: 7,467

Long to wide conversion

There is probably a less convoluted way to do this, but I think that the following achieves the desired result:

data Have;

  informat date ddmmyy10.;

  format date date9.;

  input Firm $ Date Sales1 Sales2;

  cards;

A          20/2/2012          100          50

A          21/2/2012          200          70

A          22/2/2012          300          90

B          21/2/2012          120          200

B          22/2/2012          300          50

B          23/2/2012          300          200

C          19/2/2012          50          40

C          20/2/2012          75          50

C          21/2/2012          80          30

C          22/2/2012          90          100

;

proc sort data=have;

  by date firm;

run;

data fortranspose(keep = date _name_ sales) / view=fortranspose;

   set have;

   by date;

   array ss(2) sales1-sales2;

   if first.date then obsnum = 0;

   do i = 1 to 2;

      _name_ = trim(firm) || put(i,1.);

      sales = ss(i);

      output;

   end;

run;

proc transpose data=fortranspose out=want(drop=_name_);

   by date;

   var sales;

run;

data want;

  retain date a1 a2 b1 b2 c1 c2;

  set want;

run;

Super User
Posts: 5,495

Long to wide conversion

Art,

I like it.  So if you have a variable called _NAME_, is it implied that PROC TRANSPOSE should use it as an ID variable?

PROC Star
Posts: 7,467

Long to wide conversion

Posted in reply to Astounding

Astounding: from the documentation: If you do not use an ID statement, PREFIX= option, or the SUFFIX= option, then PROC TRANSPOSE looks for an input variable called _NAME_ to get the names of the transposed variables.

Occasional Contributor
Posts: 7

Long to wide conversion

Hi Art,

   Would it be not efficient to complete the task by using macros? Algorithm is something like this:

1. Collect the firm names from the input file by using proc freq into the file names.

2. Read the names.sas7bdat and assign each firm name to a macro, so the names will be &firm1, &firm2 etc.

3. By using a loop extract data for each firm from the input file. For example

%do i=1 %to &maxfirmno;

data &&firm&i;

set inputfile;

where firm="&&firm&i";

run;

%end;

4. merge all &&firm&i data files by date.

Occasional Contributor
Posts: 7

Re: Long to wide conversion

Art,

Two more questions if you don't mind?

What is the use of this line in your solution?

if first.date then obsnum=0

In the original data set (i.e., have), let's assume that we have another variable named temperature, which is the mean temperature for each date. How can I copy this variable to the output dataset?

Ksharp,

Can your code be modified the include this?

Thanks.

data test;

  informat date ddmmyy10.;

  format date date9.;

  input Firm $ Date Sales1 Sales2 temp;

  cards;

A          20/2/2012          100          50        40

A          21/2/2012          200          70       50

A          22/2/2012          300          90       60

B          21/2/2012          120          200      50

B          22/2/2012          300          50      60

B          23/2/2012          300          200      62

C          19/2/2012          50          40      30

C          20/2/2012          75          50      40

C          21/2/2012          80          30      50

C          22/2/2012          90          100      60

;

run;

J.

PROC Star
Posts: 7,467

Long to wide conversion

John,

The lines:

  if first.date then obsnum=0

  by date;

were not needed in that section of code.  They were artifacts of another approach I was trying.

As for adding one or more variables, probably the easiest one to code would be to include them with a 'copy' statement.  However, that also requires getting rid of extra lines that the procedure will produce as a result of including a copy statement.

I've shown update code, using the copy statement, below:

data Have;

  informat date ddmmyy10.;

  format date date9.;

  input Firm $ Date Sales1 Sales2 temp;

  cards;

A          20/2/2012          100         50  40

A          21/2/2012          200         70  50

A          22/2/2012          300         90  60

B          21/2/2012          120        200  50

B          22/2/2012          300         50  60

B          23/2/2012          300        200  62

C          19/2/2012          50          40  30

C          20/2/2012          75          50  40

C          21/2/2012          80          30  50

C          22/2/2012          90          100 60

;

proc sort data=have;

  by date firm;

run;

data fortranspose(keep = date _name_ sales temp)

    / view=fortranspose;

  set have;

  array ss(*) sales1-sales2;

  do i = 1 to dim(ss);

    _name_ = trim(firm) || put(i,1.);

    sales = ss(i);

    output;

  end;

run;

proc transpose data=fortranspose out=want(drop=_name_);

  by date;

  var sales;

  copy temp;

run;

data want;

  retain date a1 a2 b1 b2 c1 c2 temp;

  set want;

  by date;

  if first.date;

run;

Super User
Posts: 10,018

Long to wide conversion

data Have;
  informat date ddmmyy10.;
  format date date9.;
  input Firm $ Date Sales1 Sales2;
  cards;
A          20/2/2012          100          50
A          21/2/2012          200          70
A          22/2/2012          300          90
B          21/2/2012          120          200
B          22/2/2012          300          50
B          23/2/2012          300          200
C          19/2/2012          50          40
C          20/2/2012          75          50
C          21/2/2012          80          30
C          22/2/2012          90          100
;
run;
proc sql ;
 select distinct cats('have(where=(firm="',firm,'") rename=(sales1=',firm,'1 sales2=',firm,'2))') from have;
 select distinct cats('have(where=(firm="',firm,'") rename=(sales1=',firm,'1 sales2=',firm,'2))') into : list1 - : list&sqlobs.
  from have;
quit;


%macro merge;
data want(drop=firm);
 merge %do i=1 %to &sqlobs.;
       &&list&i
       %end;
       ;
 by date;
run;
%mend merge;
%merge


Ksharp

Super User
Posts: 10,018

Long to wide conversion

Easy.

data have;
  informat date ddmmyy10.;
  format date date9.;
  input Firm $ Date Sales1 Sales2 temp;
  cards;
A          20/2/2012          100          50        40
A          21/2/2012          200          70       50
A          22/2/2012          300          90       60
B          21/2/2012          120          200      50
B          22/2/2012          300          50      60
B          23/2/2012          300          200      62
C          19/2/2012          50          40      30
C          20/2/2012          75          50      40
C          21/2/2012          80          30      50
C          22/2/2012          90          100      60
;
run;


proc sql ;
 select distinct cats('have(where=(firm="',firm,'") rename=(sales1=',firm,'1 sales2=',firm,'2 temp=',firm,'_temp))') from have;
 select distinct cats('have(where=(firm="',firm,'") rename=(sales1=',firm,'1 sales2=',firm,'2 temp=',firm,'_temp))') into : list1 - : list&sqlobs.
  from have;
quit;


%macro merge;
data want(drop=firm);
 merge %do i=1 %to &sqlobs.;
       &&list&i
       %end;
       ;
 by date;
run;
%mend merge;
%merge


Ksharp

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 9 replies
  • 1189 views
  • 0 likes
  • 4 in conversation