I am trying to simulate something similar as what I can do in Microsoft Access using Crosstab query. In MS crosstab, I can choose which one to be row, column and value.
In SAS, I can use proc freq tables to simulate that crosstab, but so far I can only simulate frequency count. Suppose I want to have other variables or calculations that I want to simulate, how can I achieve this.
My data is below: suppose I start with Class_Start, and after a while I migrate into Class_End. At the Class_Start I have a Volume. That is the original data. Using proc freq tables, I can easily make a migration count similar to crosstab. It means, from each Class_Start, how many counts have migrated into Class_End. Let's call this observation-weighted.
data test;
infile cards delimiter=',';
input Class_Start Class_End Volume;
cards;
1,1,100
1,1,200
1,2,300
1,3,400
1,7,600
2,1,50
2,2,300
2,2,500
2,3,100
2,7,100
3,1,20
3,2,100
3,3,500
3,4,300
3,5,100
3,7,100
4,1,100
4,2,200
4,3,100
4,4,300
4,5,200
4,6,100
;
run;
proc freq data=test;
tables class_start*class_end /nopercent norow nocol
;
quit;
Now I want to make a Volume-weighted migration crosstab, meaning, from Class_Start, how much of the Volume has migrated into each Class_End.
For that I made some calculations:
proc sql;
create table test2 as
select *,
sum(Volume) as Volume_End
from test
group by Class_Start, Class_End
;
quit;
proc sql;
create table test3 as
select *,
sum(Volume) as Volume_Start
from test2
group by Class_Start
;
quit;
proc sql;
create table test4 as
select *,
Volume_End/Volume_Start as Migrate_percent format=10.2
from test3
;
quit;
So here the Migrate_percent is the Volume-weighted migration number.
How can I make it into a cross table, with Class_Start in rows, Class_End in column, and value=Migrate_percent?
Furthermore, is there a very simple method to calculate this Migrate_percent into crosstab, without going through all those proc sql sum procedures above?
Thanks for your help.
Ken
Sounds like you are making this too hard. Don't you just want to run PROC FREQ with a WEIGHT statement and look at the row percentage?
proc freq data=test;
tables class_start*class_end /nofreq nocol nopercent ;
weight volume ;
run;
Produces this report.
The FREQ Procedure
Table of Class_Start by Class_End
Class_Start Class_End
Row Pct | 1| 2| 3| 4| 5| 6| 7| Total
--------+--------+--------+--------+--------+--------+--------+--------+
1 | 18.75 | 18.75 | 25.00 | 0.00 | 0.00 | 0.00 | 37.50 |
--------+--------+--------+--------+--------+--------+--------+--------+
2 | 4.76 | 76.19 | 9.52 | 0.00 | 0.00 | 0.00 | 9.52 |
--------+--------+--------+--------+--------+--------+--------+--------+
3 | 1.79 | 8.93 | 44.64 | 26.79 | 8.93 | 0.00 | 8.93 |
--------+--------+--------+--------+--------+--------+--------+--------+
4 | 10.00 | 20.00 | 10.00 | 30.00 | 20.00 | 10.00 | 0.00 |
--------+--------+--------+--------+--------+--------+--------+--------+
Total 470 1400 1100 600 300 100 800 4770
Just a shot from the hip: have you already looked at proc tabulate?
Never used that one. But thanks for the advice...I will take a look
Posting test data - in the form of a datastep, and what you want the output to look like would help.
The output should be exactly the same as you run these codes:
proc freq data=test;
tables class_start*class_end /nopercent norow nocol
;
quit;
except the it is not the frequency numbers, but the Migrate_percent in the table.
Table of Class_Start by Class_End | ||||||||
Class_Start | Class_End | |||||||
Frequency | 1 | 2 | 3 | 4 | 5 | 6 | 7 | Total |
1 | 2 | 1 | 1 | 0 | 0 | 0 | 1 | 5 |
2 | 1 | 2 | 1 | 0 | 0 | 0 | 1 | 5 |
3 | 1 | 1 | 1 | 1 | 1 | 0 | 1 | 6 |
4 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 6 |
Total | 5 | 5 | 4 | 2 | 2 | 1 | 3 | 22 |
2 | 1 | 1 | 0 | 0 | 0 | 1 | 5 |
1 | 2 | 1 | 0 | 0 | 0 | 1 | 5 |
1 | 1 | 1 | 1 | 1 | 0 | 1 | 6 |
1 | 1 | 1 | 1 | 1 | 1 | 0 | 6 |
5 | 5 | 4 | 2 | 2 | 1 | 3 | 22 |
I have solved this problem in an extremely cumbersome way, so cumbersome that I can't believe that this is the solution. So I am very glad to hear if anyone has any elegant solution.
Basically, it is a proc freq tables in 2-way, but I want to opt out the frequency, and in with the Migrate_percent. But none of the proc freq or proc tabulate will work. So I have done the whole thing in the following way:
proc sql;
create table test5 as
select distinct Class_Start,
class_end,
Migrate_percent
from test4
group by class_start, class_end
;
quit;
proc sql;
create table test6 as
select *
from (select distinct class_start from test5) as a
cross join (select distinct class_end from test5) as b
order by class_start, class_end
;
run;
proc sql;
create table test7 as
select a.class_start,
a.class_end,
case
when not missing(b.Migrate_percent) then b.Migrate_percent
else 0
end as Migrate_percent format=10.2
from test6 as a
left join test5 as b
on a.class_start=b.class_start
and a.class_end=b.class_end
order by a.class_start, a.class_end
;
quit;
proc transpose data=test7 out=test8 (drop= _Name_);
by class_start;
id Class_End;
run;
The result of transpose is exactly what I want. But I can't believe it will take so many steps in between to do this, while proc freq can do in one go.
Class_Start 1 2 3 4 5 6 7
1 0.19 0.19 0.25 0.00 0.00 0.00 0.38
2 0.05 0.76 0.10 0.00 0.00 0.00 0.10
3 0.02 0.09 0.45 0.27 0.09 0.00 0.09
4 0.10 0.20 0.10 0.30 0.20 0.10 0.00
Thanks for any tips.
Ken
I have solved this problem in an extremely cumbersome way, so cumbersome that I can't believe that this is the solution. So I am very glad to hear if anyone has any elegant solution.
Basically, it is a proc freq tables in 2-way, but I want to opt out the frequency, and in with the Migrate_percent. But none of the proc freq or proc tabulate will work. So I have done the whole thing in the following way:
proc sql;
create table test5 as
select distinct Class_Start,
class_end,
Migrate_percent
from test4
group by class_start, class_end
;
quit;
proc sql;
create table test6 as
select *
from (select distinct class_start from test5) as a
cross join (select distinct class_end from test5) as b
order by class_start, class_end
;
run;
proc sql;
create table test7 as
select a.class_start,
a.class_end,
case
when not missing(b.Migrate_percent) then b.Migrate_percent
else 0
end as Migrate_percent format=10.2
from test6 as a
left join test5 as b
on a.class_start=b.class_start
and a.class_end=b.class_end
order by a.class_start, a.class_end
;
quit;
proc transpose data=test7 out=test8 (drop= _Name_);
by class_start;
id Class_End;
run;
The result of transpose is exactly what I want. But I can't believe it will take so many steps in between to do this, while proc freq can do in one go.
Thanks for any tips.
Ken
Sounds like you are making this too hard. Don't you just want to run PROC FREQ with a WEIGHT statement and look at the row percentage?
proc freq data=test;
tables class_start*class_end /nofreq nocol nopercent ;
weight volume ;
run;
Produces this report.
The FREQ Procedure
Table of Class_Start by Class_End
Class_Start Class_End
Row Pct | 1| 2| 3| 4| 5| 6| 7| Total
--------+--------+--------+--------+--------+--------+--------+--------+
1 | 18.75 | 18.75 | 25.00 | 0.00 | 0.00 | 0.00 | 37.50 |
--------+--------+--------+--------+--------+--------+--------+--------+
2 | 4.76 | 76.19 | 9.52 | 0.00 | 0.00 | 0.00 | 9.52 |
--------+--------+--------+--------+--------+--------+--------+--------+
3 | 1.79 | 8.93 | 44.64 | 26.79 | 8.93 | 0.00 | 8.93 |
--------+--------+--------+--------+--------+--------+--------+--------+
4 | 10.00 | 20.00 | 10.00 | 30.00 | 20.00 | 10.00 | 0.00 |
--------+--------+--------+--------+--------+--------+--------+--------+
Total 470 1400 1100 600 300 100 800 4770
Thanks Tom the code worked!!
If Volumn has decimal , Try proc tabulate:
data test;
infile cards delimiter=',';
input Class_Start Class_End Volume;
cards;
1,1,100
1,1,200
1,2,300
1,3,400
1,7,600
2,1,50
2,2,300
2,2,500
2,3,100
2,7,100
3,1,20
3,2,100
3,3,500
3,4,300
3,5,100
3,7,100
4,1,100
4,2,200
4,3,100
4,4,300
4,5,200
4,6,100
;
run;
proc tabulate data=test;
class Class_Start Class_End;
var Volume;
table Class_Start,Class_End*Volume=''*rowpctsum=''/misstext='0';
run;
Dear Kesheng, Thanks for the code and help. Ken
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.