BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
JohnK
Calcite | Level 5

Dear SAS Users,

How can I achieve the following task in an efficient way? Let's say that I have a long data that have two sales figures for different firms and I would like to make it wide as below. Also please note that the original dataset is a large one.

Thank you in advance for any help.

Have

FirmDateSales1Sales2
A20/2/201210050
A21/2/201220070
A22/2/201230090
B21/2/2012120200
B22/2/201230050
B23/2/2012300200
C19/2/20125040
C20/2/20127550
C21/2/20128030
C22/2/201290100


Want

dateA1A2B1B2C1C2
19/2/20125040
20/2/2012100507550
21/2/2012200701202008030
22/2/2012300903005090100
23/2/2012300200
1 ACCEPTED SOLUTION

Accepted Solutions
art297
Opal | Level 21

There is probably a less convoluted way to do this, but I think that the following achieves the desired result:

data Have;

  informat date ddmmyy10.;

  format date date9.;

  input Firm $ Date Sales1 Sales2;

  cards;

A          20/2/2012          100          50

A          21/2/2012          200          70

A          22/2/2012          300          90

B          21/2/2012          120          200

B          22/2/2012          300          50

B          23/2/2012          300          200

C          19/2/2012          50          40

C          20/2/2012          75          50

C          21/2/2012          80          30

C          22/2/2012          90          100

;

proc sort data=have;

  by date firm;

run;

data fortranspose(keep = date _name_ sales) / view=fortranspose;

   set have;

   by date;

   array ss(2) sales1-sales2;

   if first.date then obsnum = 0;

   do i = 1 to 2;

      _name_ = trim(firm) || put(i,1.);

      sales = ss(i);

      output;

   end;

run;

proc transpose data=fortranspose out=want(drop=_name_);

   by date;

   var sales;

run;

data want;

  retain date a1 a2 b1 b2 c1 c2;

  set want;

run;

View solution in original post

9 REPLIES 9
Astounding
PROC Star

JohnK,

In theory, it's not so hard.  In practice, though, you mention that your data set is long.  I suspect that "Firm" is not really as simple as A, B, and C, but takes on many more complex values that don't easily translate into unique variable names.  For example, if your data contained both of these Firms:

George Washington Carver High School North

George Washington Carver High School South

I doubt that you could come up with a good way to assign what the new variable names would be.

If you were willing to come up with a modified solution, it might look look like this.  Instead of variable names like A1, A2, B1, B2, use names like Sales0001_1, Sales0001_2, Sales0002_1, Sales0002_2, etc.  The first 4 digits of the new variables indicate which Firm it is, and you would need a separate method to translate from the arbitrary firm number to the firm name.  (This would not be difficult using a format.)

Does this sound like an acceptable solution?

art297
Opal | Level 21

There is probably a less convoluted way to do this, but I think that the following achieves the desired result:

data Have;

  informat date ddmmyy10.;

  format date date9.;

  input Firm $ Date Sales1 Sales2;

  cards;

A          20/2/2012          100          50

A          21/2/2012          200          70

A          22/2/2012          300          90

B          21/2/2012          120          200

B          22/2/2012          300          50

B          23/2/2012          300          200

C          19/2/2012          50          40

C          20/2/2012          75          50

C          21/2/2012          80          30

C          22/2/2012          90          100

;

proc sort data=have;

  by date firm;

run;

data fortranspose(keep = date _name_ sales) / view=fortranspose;

   set have;

   by date;

   array ss(2) sales1-sales2;

   if first.date then obsnum = 0;

   do i = 1 to 2;

      _name_ = trim(firm) || put(i,1.);

      sales = ss(i);

      output;

   end;

run;

proc transpose data=fortranspose out=want(drop=_name_);

   by date;

   var sales;

run;

data want;

  retain date a1 a2 b1 b2 c1 c2;

  set want;

run;

Astounding
PROC Star

Art,

I like it.  So if you have a variable called _NAME_, is it implied that PROC TRANSPOSE should use it as an ID variable?

art297
Opal | Level 21

Astounding: from the documentation: If you do not use an ID statement, PREFIX= option, or the SUFFIX= option, then PROC TRANSPOSE looks for an input variable called _NAME_ to get the names of the transposed variables.

JohnK
Calcite | Level 5

Hi Art,

   Would it be not efficient to complete the task by using macros? Algorithm is something like this:

1. Collect the firm names from the input file by using proc freq into the file names.

2. Read the names.sas7bdat and assign each firm name to a macro, so the names will be &firm1, &firm2 etc.

3. By using a loop extract data for each firm from the input file. For example

%do i=1 %to &maxfirmno;

data &&firm&i;

set inputfile;

where firm="&&firm&i";

run;

%end;

4. merge all &&firm&i data files by date.

JohnK
Calcite | Level 5

Art,

Two more questions if you don't mind?

What is the use of this line in your solution?

if first.date then obsnum=0

In the original data set (i.e., have), let's assume that we have another variable named temperature, which is the mean temperature for each date. How can I copy this variable to the output dataset?

Ksharp,

Can your code be modified the include this?

Thanks.

data test;

  informat date ddmmyy10.;

  format date date9.;

  input Firm $ Date Sales1 Sales2 temp;

  cards;

A          20/2/2012          100          50        40

A          21/2/2012          200          70       50

A          22/2/2012          300          90       60

B          21/2/2012          120          200      50

B          22/2/2012          300          50      60

B          23/2/2012          300          200      62

C          19/2/2012          50          40      30

C          20/2/2012          75          50      40

C          21/2/2012          80          30      50

C          22/2/2012          90          100      60

;

run;

J.

art297
Opal | Level 21

John,

The lines:

  if first.date then obsnum=0

  by date;

were not needed in that section of code.  They were artifacts of another approach I was trying.

As for adding one or more variables, probably the easiest one to code would be to include them with a 'copy' statement.  However, that also requires getting rid of extra lines that the procedure will produce as a result of including a copy statement.

I've shown update code, using the copy statement, below:

data Have;

  informat date ddmmyy10.;

  format date date9.;

  input Firm $ Date Sales1 Sales2 temp;

  cards;

A          20/2/2012          100         50  40

A          21/2/2012          200         70  50

A          22/2/2012          300         90  60

B          21/2/2012          120        200  50

B          22/2/2012          300         50  60

B          23/2/2012          300        200  62

C          19/2/2012          50          40  30

C          20/2/2012          75          50  40

C          21/2/2012          80          30  50

C          22/2/2012          90          100 60

;

proc sort data=have;

  by date firm;

run;

data fortranspose(keep = date _name_ sales temp)

    / view=fortranspose;

  set have;

  array ss(*) sales1-sales2;

  do i = 1 to dim(ss);

    _name_ = trim(firm) || put(i,1.);

    sales = ss(i);

    output;

  end;

run;

proc transpose data=fortranspose out=want(drop=_name_);

  by date;

  var sales;

  copy temp;

run;

data want;

  retain date a1 a2 b1 b2 c1 c2 temp;

  set want;

  by date;

  if first.date;

run;

Ksharp
Super User
data Have;
  informat date ddmmyy10.;
  format date date9.;
  input Firm $ Date Sales1 Sales2;
  cards;
A          20/2/2012          100          50
A          21/2/2012          200          70
A          22/2/2012          300          90
B          21/2/2012          120          200
B          22/2/2012          300          50
B          23/2/2012          300          200
C          19/2/2012          50          40
C          20/2/2012          75          50
C          21/2/2012          80          30
C          22/2/2012          90          100
;
run;
proc sql ;
 select distinct cats('have(where=(firm="',firm,'") rename=(sales1=',firm,'1 sales2=',firm,'2))') from have;
 select distinct cats('have(where=(firm="',firm,'") rename=(sales1=',firm,'1 sales2=',firm,'2))') into : list1 - : list&sqlobs.
  from have;
quit;


%macro merge;
data want(drop=firm);
 merge %do i=1 %to &sqlobs.;
       &&list&i
       %end;
       ;
 by date;
run;
%mend merge;
%merge


Ksharp

Ksharp
Super User

Easy.

data have;
  informat date ddmmyy10.;
  format date date9.;
  input Firm $ Date Sales1 Sales2 temp;
  cards;
A          20/2/2012          100          50        40
A          21/2/2012          200          70       50
A          22/2/2012          300          90       60
B          21/2/2012          120          200      50
B          22/2/2012          300          50      60
B          23/2/2012          300          200      62
C          19/2/2012          50          40      30
C          20/2/2012          75          50      40
C          21/2/2012          80          30      50
C          22/2/2012          90          100      60
;
run;


proc sql ;
 select distinct cats('have(where=(firm="',firm,'") rename=(sales1=',firm,'1 sales2=',firm,'2 temp=',firm,'_temp))') from have;
 select distinct cats('have(where=(firm="',firm,'") rename=(sales1=',firm,'1 sales2=',firm,'2 temp=',firm,'_temp))') into : list1 - : list&sqlobs.
  from have;
quit;


%macro merge;
data want(drop=firm);
 merge %do i=1 %to &sqlobs.;
       &&list&i
       %end;
       ;
 by date;
run;
%mend merge;
%merge


Ksharp

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 2751 views
  • 0 likes
  • 4 in conversation