01-14-2016 03:08 PM
I use proc freq to generate numbers. Then, in steading of print out the dataset and calculate later by myself, I want SAS to do extra calculation, to calculate the sum of number of each column, and generate another table, the percentage of each number divided by the sum*100. That is, to have two variables, sum and rate.
My sample data is attached
Would you please recommend a sas code?
01-14-2016 03:15 PM
Please show the desired output for the example data you show in the attachment.
Also, does the output have to be a data set or are you looking for a report table.
01-14-2016 03:25 PM
01-14-2016 03:27 PM
01-14-2016 04:17 PM
Also, if your example data is the result from proc freq it would probably be better to provide the raw data as the the types of sums and rates your are looking for are possible from various report procedures.
01-14-2016 04:06 PM
Proc freq generates percents, rows or columns for your data as well. The output should contain on of the numbers you're looking for, and you might be better off trying to manipulate that output rather than calculate information from scratch.
RTM for options on the TABLES statement.
01-14-2016 04:27 PM
Not very elegant but here is a solution:
infile cards dsd dlm=' ';
input obs condition$ year2010 year2011 year2012;
1 a 12 23 45
2 b 23 45 57
3 c 45 57 89
4 d 57 89 20
5 e 89 20 29
6 f 20 29 59
proc transpose data=have out=tran_have(rename=(_NAME_=Year));by obs condition;var year:;
create table sum as
select *,sum(col1) as sum
group by year
order by condition;
proc transpose data=sum out=tran_sum;by condition;id year;var sum;
create table want as
select a.year2010/b.year2010*100 as year2010,
a.year2011/b.year2011*100 as year2011,
a.year2012/b.year2012*100 as year2012
from have a left join
tran_sum b on
01-14-2016 05:26 PM
And another using data in the first post for input:
/* first get some data to manipulate*/ data junk; input obs condition $ Yr2010 Yr2011 Yr201Yr2 Yr2013 Yr2014 Yr2015 Yr2016; array y Yr2010 Yr2011 Yr201Yr2 Yr2013 Yr2014 Yr2015 Yr2016; /* this reshapes the data into something that proc tabulate likes better*/ do i= 1 to dim(y); year= 2009+i; value = y[i]; output; end; keep obs condition year value; datalines; 1 a 31 7 98 41 68 41 68 2 b 93 28 99 32 53 32 53 3 c 66 2 33 68 44 68 44 4 d 93 80 58 50 41 50 41 5 e 74 11 93 73 39 73 39 6 f 13 71 45 35 39 35 39 7 g 72 63 61 45 42 45 42 8 h 57 53 64 53 16 53 16 9 i 8 14 81 21 92 21 92 ; run; proc tabulate data=junk; class obs condition year; var value; table obs*condition all='Sum', year='' * value=''*sum*f=best8. year='' * value=''*colpctsum='Rate'*f=f16.8; run;
If the original data used for proc freq consisted of Condition Year and a variable being counted the above might be sufficient changing SUM to N and colpctsum to colpctn and the Value could be left out. Maybe.
01-15-2016 12:37 PM
ballardw, thank you.
I got the error message
ERROR: There are multiple analysis variables associated with a single table cell in the following
nesting : category * value * year * value * sum * f.
actually the real data is not year from 2010 to 2016, it is like female2010, female2011, male2010, male2011, trans2010, trans2011, unknow2010, unknow2011
Would you please provide an adjusted code. And there is no need for including ID. Thanks. It is ordered by frequency. that is already done prevoiusly
just keep the order
01-15-2016 10:31 AM
Thank you. What is there are some missing values in some blanks. Any adjustment for that?
is "DSD" for this purpose?
01-15-2016 11:14 AM
"Missing values in some blanks" implies data related issues but we need more infomation to see what you mean. Such the input data and the code applied.
DSD is only relavent when reading data from or writing to an external file. When reading a file that is delimited, when the delimiter occurs encolsed by quotation marks the delimiter is ignored and 2 consecutive delimiters indicate a missing value.
01-15-2016 11:43 AM
Thanks. I have several errors when i run this code.
1. at first, i did not set a variable name as "year"
your code is year2011 2012 etc
so the error was
ERROR: Variable YEAR not found.
2. Then I set my variable name as "year", although it is not perfect, because it is really not simple year. the first row is more about different catogies with different year. My task is like calculate "child2003" "child2004" "adult2003" "adult2004" "senior2003" "senior2004" so it is not a simple year but a mix of year and categories
my another variable is "category" instead of "condition",
then the error showed below
ERROR: Data set WORK.HAVE is not sorted in ascending sequence. The current BY group has Category = Pi
and the next BY group has Category = Is.
NOTE: The SAS System stopped processing this step because of errors.
NOTE: There were 3 observations read from the data set WORK.HAVE.