Dear SAS Users,
How can I achieve the following task in an efficient way? Let's say that I have a long data that have two sales figures for different firms and I would like to make it wide as below. Also please note that the original dataset is a large one.
Thank you in advance for any help.
Have
Firm | Date | Sales1 | Sales2 |
A | 20/2/2012 | 100 | 50 |
A | 21/2/2012 | 200 | 70 |
A | 22/2/2012 | 300 | 90 |
B | 21/2/2012 | 120 | 200 |
B | 22/2/2012 | 300 | 50 |
B | 23/2/2012 | 300 | 200 |
C | 19/2/2012 | 50 | 40 |
C | 20/2/2012 | 75 | 50 |
C | 21/2/2012 | 80 | 30 |
C | 22/2/2012 | 90 | 100 |
Want
date | A1 | A2 | B1 | B2 | C1 | C2 |
19/2/2012 | 50 | 40 | ||||
20/2/2012 | 100 | 50 | 75 | 50 | ||
21/2/2012 | 200 | 70 | 120 | 200 | 80 | 30 |
22/2/2012 | 300 | 90 | 300 | 50 | 90 | 100 |
23/2/2012 | 300 | 200 |
There is probably a less convoluted way to do this, but I think that the following achieves the desired result:
data Have;
informat date ddmmyy10.;
format date date9.;
input Firm $ Date Sales1 Sales2;
cards;
A 20/2/2012 100 50
A 21/2/2012 200 70
A 22/2/2012 300 90
B 21/2/2012 120 200
B 22/2/2012 300 50
B 23/2/2012 300 200
C 19/2/2012 50 40
C 20/2/2012 75 50
C 21/2/2012 80 30
C 22/2/2012 90 100
;
proc sort data=have;
by date firm;
run;
data fortranspose(keep = date _name_ sales) / view=fortranspose;
set have;
by date;
array ss(2) sales1-sales2;
if first.date then obsnum = 0;
do i = 1 to 2;
_name_ = trim(firm) || put(i,1.);
sales = ss(i);
output;
end;
run;
proc transpose data=fortranspose out=want(drop=_name_);
by date;
var sales;
run;
data want;
retain date a1 a2 b1 b2 c1 c2;
set want;
run;
JohnK,
In theory, it's not so hard. In practice, though, you mention that your data set is long. I suspect that "Firm" is not really as simple as A, B, and C, but takes on many more complex values that don't easily translate into unique variable names. For example, if your data contained both of these Firms:
George Washington Carver High School North
George Washington Carver High School South
I doubt that you could come up with a good way to assign what the new variable names would be.
If you were willing to come up with a modified solution, it might look look like this. Instead of variable names like A1, A2, B1, B2, use names like Sales0001_1, Sales0001_2, Sales0002_1, Sales0002_2, etc. The first 4 digits of the new variables indicate which Firm it is, and you would need a separate method to translate from the arbitrary firm number to the firm name. (This would not be difficult using a format.)
Does this sound like an acceptable solution?
There is probably a less convoluted way to do this, but I think that the following achieves the desired result:
data Have;
informat date ddmmyy10.;
format date date9.;
input Firm $ Date Sales1 Sales2;
cards;
A 20/2/2012 100 50
A 21/2/2012 200 70
A 22/2/2012 300 90
B 21/2/2012 120 200
B 22/2/2012 300 50
B 23/2/2012 300 200
C 19/2/2012 50 40
C 20/2/2012 75 50
C 21/2/2012 80 30
C 22/2/2012 90 100
;
proc sort data=have;
by date firm;
run;
data fortranspose(keep = date _name_ sales) / view=fortranspose;
set have;
by date;
array ss(2) sales1-sales2;
if first.date then obsnum = 0;
do i = 1 to 2;
_name_ = trim(firm) || put(i,1.);
sales = ss(i);
output;
end;
run;
proc transpose data=fortranspose out=want(drop=_name_);
by date;
var sales;
run;
data want;
retain date a1 a2 b1 b2 c1 c2;
set want;
run;
Art,
I like it. So if you have a variable called _NAME_, is it implied that PROC TRANSPOSE should use it as an ID variable?
Astounding: from the documentation: If you do not use an ID statement, PREFIX= option, or the SUFFIX= option, then PROC TRANSPOSE looks for an input variable called _NAME_ to get the names of the transposed variables.
Hi Art,
Would it be not efficient to complete the task by using macros? Algorithm is something like this:
1. Collect the firm names from the input file by using proc freq into the file names.
2. Read the names.sas7bdat and assign each firm name to a macro, so the names will be &firm1, &firm2 etc.
3. By using a loop extract data for each firm from the input file. For example
%do i=1 %to &maxfirmno;
data &&firm&i;
set inputfile;
where firm="&&firm&i";
run;
%end;
4. merge all &&firm&i data files by date.
Art,
Two more questions if you don't mind?
What is the use of this line in your solution?
if first.date then obsnum=0
In the original data set (i.e., have), let's assume that we have another variable named temperature, which is the mean temperature for each date. How can I copy this variable to the output dataset?
Ksharp,
Can your code be modified the include this?
Thanks.
data test;
informat date ddmmyy10.;
format date date9.;
input Firm $ Date Sales1 Sales2 temp;
cards;
A 20/2/2012 100 50 40
A 21/2/2012 200 70 50
A 22/2/2012 300 90 60
B 21/2/2012 120 200 50
B 22/2/2012 300 50 60
B 23/2/2012 300 200 62
C 19/2/2012 50 40 30
C 20/2/2012 75 50 40
C 21/2/2012 80 30 50
C 22/2/2012 90 100 60
;
run;
J.
John,
The lines:
if first.date then obsnum=0
by date;
were not needed in that section of code. They were artifacts of another approach I was trying.
As for adding one or more variables, probably the easiest one to code would be to include them with a 'copy' statement. However, that also requires getting rid of extra lines that the procedure will produce as a result of including a copy statement.
I've shown update code, using the copy statement, below:
data Have;
informat date ddmmyy10.;
format date date9.;
input Firm $ Date Sales1 Sales2 temp;
cards;
A 20/2/2012 100 50 40
A 21/2/2012 200 70 50
A 22/2/2012 300 90 60
B 21/2/2012 120 200 50
B 22/2/2012 300 50 60
B 23/2/2012 300 200 62
C 19/2/2012 50 40 30
C 20/2/2012 75 50 40
C 21/2/2012 80 30 50
C 22/2/2012 90 100 60
;
proc sort data=have;
by date firm;
run;
data fortranspose(keep = date _name_ sales temp)
/ view=fortranspose;
set have;
array ss(*) sales1-sales2;
do i = 1 to dim(ss);
_name_ = trim(firm) || put(i,1.);
sales = ss(i);
output;
end;
run;
proc transpose data=fortranspose out=want(drop=_name_);
by date;
var sales;
copy temp;
run;
data want;
retain date a1 a2 b1 b2 c1 c2 temp;
set want;
by date;
if first.date;
run;
data Have; informat date ddmmyy10.; format date date9.; input Firm $ Date Sales1 Sales2; cards; A 20/2/2012 100 50 A 21/2/2012 200 70 A 22/2/2012 300 90 B 21/2/2012 120 200 B 22/2/2012 300 50 B 23/2/2012 300 200 C 19/2/2012 50 40 C 20/2/2012 75 50 C 21/2/2012 80 30 C 22/2/2012 90 100 ; run; proc sql ; select distinct cats('have(where=(firm="',firm,'") rename=(sales1=',firm,'1 sales2=',firm,'2))') from have; select distinct cats('have(where=(firm="',firm,'") rename=(sales1=',firm,'1 sales2=',firm,'2))') into : list1 - : list&sqlobs. from have; quit; %macro merge; data want(drop=firm); merge %do i=1 %to &sqlobs.; &&list&i %end; ; by date; run; %mend merge; %merge
Ksharp
Easy.
data have; informat date ddmmyy10.; format date date9.; input Firm $ Date Sales1 Sales2 temp; cards; A 20/2/2012 100 50 40 A 21/2/2012 200 70 50 A 22/2/2012 300 90 60 B 21/2/2012 120 200 50 B 22/2/2012 300 50 60 B 23/2/2012 300 200 62 C 19/2/2012 50 40 30 C 20/2/2012 75 50 40 C 21/2/2012 80 30 50 C 22/2/2012 90 100 60 ; run; proc sql ; select distinct cats('have(where=(firm="',firm,'") rename=(sales1=',firm,'1 sales2=',firm,'2 temp=',firm,'_temp))') from have; select distinct cats('have(where=(firm="',firm,'") rename=(sales1=',firm,'1 sales2=',firm,'2 temp=',firm,'_temp))') into : list1 - : list&sqlobs. from have; quit; %macro merge; data want(drop=firm); merge %do i=1 %to &sqlobs.; &&list&i %end; ; by date; run; %mend merge; %merge
Ksharp
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.