Hi
I have the below proc freq statement and partial result. It is only returning count and percent. How do I include things like row percent, column percent, cumulative percent?
Paul
proc freq data=s1AgeRanges noprint;
tables cnty_name*startyear*agecat4*exit*exitMonthCategory/out=s1age1;
run;
Albany 2006 1 XCA 6 1 0.0015617679
Albany 2006 1 XCA 12 1 0.0015617679
Albany 2006 1 XCA 24 3 0.0046853038
Albany 2006 1 XCA 30 3 0.0046853038
.........
Paul, I hadn't responded until now because (1) I had my real job to do and (2) I'm not sure if I understand your requirements.
However, that said, I also misunderstood what proc freq was using as the denominator for table percents, thus had to revise the code a bit.
You can easily amass the desired denominator by including a datastep with a double dow loop in it. Actually, the entire code could be wrapped within the same datastep but, honestly, I thought that would be too difficult to follow.
Regardless, you said that the code wasn't correctly producing want3. You had defined want3 as:
3. For each county, start year, exit and agecat, the cumulative number of kids who have are in a given exitMonthCateogory. This is just adding the numbers in each exitMonthCateogory across.
That is what want 3 is producing. Here are two lines from the results, the first showing want1 (the counts) and the second showing want3 (the cummulative counts across the row):
cnty_name | startyear | exit | agecat4 | DataElement | 1 Month | 2 Months | 3 Months | 4 Months | 5 Months | 6 Months | 7-12 Months | 1-1.5 Years | >1.51-1.99 Years | 2-2.5 Years | 2.51-2.99 Years | 3-3.5 Years | 3.51-3.99 Years | 4-4.5 Years | 4.51-4.99 Years | 5-5.5 Years | >5.5 Years |
County-CLPR | 2006 | XCA | 1 | want1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 2 | 4 | 6 | 6 | 5 | 8 | 2 | 0 | 0 | 0 |
County-CLPR | 2006 | XCA | 1 | want3 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 3 | 7 | 13 | 19 | 24 | 32 | 34 | 34 | 34 | 34 |
If the above is not what you are seeking, then I don't understand the requirements. On the other hand, if it is what you wanted, then try the following revised code:
options validvarname=v7;
PROC IMPORT OUT= WORK.s1ageshort
DATAFILE= "D:\art\agerange-posting2.xlsx"
DBMS=XLSX REPLACE;
GETNAMES=YES;
DATAROW=2;
RUN;
proc sort data=s1ageshort;
by cnty_name startyear;
run;
proc freq data=s1ageshort noprint;
tables exit*agecat4*exitMonthCategory/
out=s1ageshorttest (drop=percent pct_col pct_tabl
rename=(
count=want1
pct_row=want2
))
outpct sparse;
by cnty_name startyear;
run;
proc transpose data=s1ageshorttest out=want (drop=_label_
rename=(_name_=DataElement));
by cnty_name startyear exit agecat4;
id exitMonthCategory;
run;
data want;
do until (last.startyear);
set want;
array row(17) _1--_72;
by cnty_name startyear;
if first.startyear then total=0;
if DataElement='want1' then total=sum(total,sum(of row(*)));
end;
do until (last.startyear);
set want;
by cnty_name startyear;
output;
end;
run;
data want (drop=tot: i);
set want;
by cnty_name startyear exit agecat4;
array row(17) _1--_72;
array tot(17);
do i=1 to 17;
if missing(row(i)) then row(i)=0;
end;
if first.startyear then do i=1 to 17;
tot(i)=row(i);
end;
else if DataElement eq 'want1' then do i=1 to 17;
tot(i)+row(i);
end;
output;
if DataElement eq 'want1' then do;
DataElement='want3';
do i=2 to 17;
row(i)+row(i-1);
end;
output;
DataElement='want4';
do i=1 to 17;
row(i)=100*row(i)/total;
end;
output;
end;
if last.startyear then do;
DataElement='want5';
do i=1 to 17;
row(i)=tot(i);
end;
call missing(exit);
call missing(agecat4);
output;
end;
run;
options validvarname=any;
data want;
set want (rename=(
_1='1 Month'n
_2='2 Months'n
_3='3 Months'n
_4='4 Months'n
_5='5 Months'n
_6='6 Months'n
_12='7-12 Months'n
_18='1-1.5 Years'n
_24='>1.51-1.99 Years'n
_30='2-2.5 Years'n
_36='2.51-2.99 Years'n
_42='3-3.5 Years'n
_48='3.51-3.99 Years'n
_54='4-4.5 Years'n
_60='4.51-4.99 Years'n
_66='5-5.5 Years'n
_72='>5.5 Years'n));
run;
Hi Paul. May be you can consider using ods output instead. HTH.
ods output CrossTabFreqs=freqs;
proc freq data=s1AgeRanges noprint;
tables cnty_name*startyear*agecat4*exit*exitMonthCategory;
run;
ods output close;
Paul, you only have to add one thing to the out= part of the statement, namely:
proc freq data=have noprint;
tables cnty_name*agecat4*exitMonthCategory/out=test outpct;
format exitMonthCategory timespan.;
run;
When I add both outpct and outcum I am only getting the three percents and not the cumulative percents.
Also, these percents on not based on the correct strata/disaggregation. I have 5 variables in the proc freq output with one being the one that is being counted. Is there a way to calculate all the percents based on how the data is stratified by the 4 variables?
Paul
Paul: I didn't suggest adding outcum as that option is only relevant for one-way tables.
You will have to show an example indicating how you want the percents calculated. The current output, using only the outpct option, includes the following calculations:
percent: which is the percent the cell comprises of all of the data
pct_tabl which is the percent the cell comprises of the data for a particular county
pct_row which is the percent the cell comprises of the row in which it is contained
pct_col which is the percent the cell comprises of a particular column for a particular county
If you need others, e.g. cumulative, they could probably be easily accomplished, from those percents and counts, within a fairly simple datastep.
Below is a link to a sorted version of the same sample data file I posted earlier. This one is sorted by county->startyear->exit->agecat. Also is indicated in shading what a denominator is for all these calculations: the combination of county/start year. Based on using this denominator, I am trying to calculate the following statistics using the freq table you found in the other posting:
http://www.nycourts.gov/surveys/cwcip/agerange-posting2-den.zip
1. For each county, start year, exit and agecat, the number of kids who have are in a given exitMonthCateogory (this is already done based on existing freq calculation).
2. For each county, start year, exit and agecat, the percent of kids who have are in a given exitMonthCateogory. The denominator for this is the total number who are in a particular startyear/county combination. The current percent calculation in the freq. calculation is not using the correct denominator.
3. For each county, start year, exit and agecat, the cumulative number of kids who have are in a given exitMonthCateogory. This is just adding the numbers in each exitMonthCateogory across.
4. For each county, start year, exit and agecat, the cumulative percent of kids who have are in a given exitMonthCateogory. The denominator for this is the total number who are in a particular startyear/county combination.
5. The total number of kids in each startyear/county combination.
I was thinking that each start/county needs to be a separate table in the freq calculation, but that would create a lot of tables since there are 21 counties and 6 startyears: 126 tables.
If there are any thoughts on how to determine the correct denominator for these stratafications, please let me know.
Paul
Paul: There is probably an easier way to do this, and you will have to check to see if I got all of your requirements correctly, but the following should come at least pretty close to doing what you want:
options validvarname=v7;
PROC IMPORT OUT= WORK.s1ageshort
DATAFILE= "D:\art\agerange-posting2.xlsx"
DBMS=XLSX REPLACE;
GETNAMES=YES;
DATAROW=2;
RUN;
proc sort data=s1ageshort;
by cnty_name startyear;
run;
proc freq data=s1ageshort noprint;
tables exit*agecat4*exitMonthCategory/
out=s1ageshorttest (drop=percent pct_col
rename=(
count=want1
pct_row=want2
pct_tabl=want4
))
outpct sparse;
by cnty_name startyear;
run;
proc transpose data=s1ageshorttest out=want (drop=_label_
rename=(_name_=DataElement));
by cnty_name startyear exit agecat4;
id exitMonthCategory;
run;
data want (drop=tot:);
set want;
by cnty_name startyear exit agecat4;
array row(17) _1--_72;
array tot(17);
if first.startyear then do i=1 to 17;
tot(i)=row(i);
end;
else if DataElement eq 'want1' then do i=1 to 17;
tot(i)+row(i);
end;
output;
if DataElement eq 'want1' then do;
DataElement='want3';
do i=2 to 17;
row(i)+row(i-1);
end;
output;
end;
if last.startyear then do;
DataElement='want5';
do i=1 to 17;
row(i)=tot(i);
end;
call missing(exit);
call missing(agecat4);
output;
end;
run;
options validvarname=any;
data want;
set want (rename=(
_1='1 Month'n
_2='2 Months'n
_3='3 Months'n
_4='4 Months'n
_5='5 Months'n
_6='6 Months'n
_12='7-12 Months'n
_18='1-1.5 Years'n
_24='>1.51-1.99 Years'n
_30='2-2.5 Years'n
_36='2.51-2.99 Years'n
_42='3-3.5 Years'n
_48='3.51-3.99 Years'n
_54='4-4.5 Years'n
_60='4.51-4.99 Years'n
_66='5-5.5 Years'n
_72='>5.5 Years'n));
run;
options validvarname=v7;
Hi Art
Thanks a lot, I am going through this today. I am also posting a few other questions on data step processing.
Paul
Hi Art
I ran the above code, while it does return percents, I don't think they are correct apparently. They are similar to running proc freq actually.
By using the below code (following using your original code), I was able to include the freq number (#1 above), the percent (#2 above), and the startyear/county combination number (#5 above) in one data set (http://www.nycourts.gov/surveys/cwcip/agerange1-11-13-12.zip). The problem is that I cannot get the cumulative number (#3 above) and cumulative percent (#4 above).
Do you or anyone know how to start at the top of a data file and cumulatively add the values of a column from 1 record to the next record and populate the cumulative number in a new column for each record? This calculation needs to be done situationally--when the previous and current record have the same county, year, exit values.
Paul
proc freq data=s1ageshort;
tables cnty_name*startyear/out=s1ageshortyearfreq /*this is the total per cnty_name,startyear combo*/;
run;
proc sort data=s1ageshortyearfreq out=s1ageshortyearfreqsort;
by cnty_name startyear;
run;
proc sort data=s1ageshorttest out=s1ageshorttestsort;
by cnty_name startyear;
run;
data s1ageyear1;
merge work.s1ageshortyearfreqsort (keep=count cnty_name startyear rename=(count=cohortyearcount)) work.s1ageshorttestsort (drop=percent);
by cnty_name startyear;
Percent1=count/cohortyearcount /*this is the percent*/;
run;
Paul, I hadn't responded until now because (1) I had my real job to do and (2) I'm not sure if I understand your requirements.
However, that said, I also misunderstood what proc freq was using as the denominator for table percents, thus had to revise the code a bit.
You can easily amass the desired denominator by including a datastep with a double dow loop in it. Actually, the entire code could be wrapped within the same datastep but, honestly, I thought that would be too difficult to follow.
Regardless, you said that the code wasn't correctly producing want3. You had defined want3 as:
3. For each county, start year, exit and agecat, the cumulative number of kids who have are in a given exitMonthCateogory. This is just adding the numbers in each exitMonthCateogory across.
That is what want 3 is producing. Here are two lines from the results, the first showing want1 (the counts) and the second showing want3 (the cummulative counts across the row):
cnty_name | startyear | exit | agecat4 | DataElement | 1 Month | 2 Months | 3 Months | 4 Months | 5 Months | 6 Months | 7-12 Months | 1-1.5 Years | >1.51-1.99 Years | 2-2.5 Years | 2.51-2.99 Years | 3-3.5 Years | 3.51-3.99 Years | 4-4.5 Years | 4.51-4.99 Years | 5-5.5 Years | >5.5 Years |
County-CLPR | 2006 | XCA | 1 | want1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 2 | 4 | 6 | 6 | 5 | 8 | 2 | 0 | 0 | 0 |
County-CLPR | 2006 | XCA | 1 | want3 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 3 | 7 | 13 | 19 | 24 | 32 | 34 | 34 | 34 | 34 |
If the above is not what you are seeking, then I don't understand the requirements. On the other hand, if it is what you wanted, then try the following revised code:
options validvarname=v7;
PROC IMPORT OUT= WORK.s1ageshort
DATAFILE= "D:\art\agerange-posting2.xlsx"
DBMS=XLSX REPLACE;
GETNAMES=YES;
DATAROW=2;
RUN;
proc sort data=s1ageshort;
by cnty_name startyear;
run;
proc freq data=s1ageshort noprint;
tables exit*agecat4*exitMonthCategory/
out=s1ageshorttest (drop=percent pct_col pct_tabl
rename=(
count=want1
pct_row=want2
))
outpct sparse;
by cnty_name startyear;
run;
proc transpose data=s1ageshorttest out=want (drop=_label_
rename=(_name_=DataElement));
by cnty_name startyear exit agecat4;
id exitMonthCategory;
run;
data want;
do until (last.startyear);
set want;
array row(17) _1--_72;
by cnty_name startyear;
if first.startyear then total=0;
if DataElement='want1' then total=sum(total,sum(of row(*)));
end;
do until (last.startyear);
set want;
by cnty_name startyear;
output;
end;
run;
data want (drop=tot: i);
set want;
by cnty_name startyear exit agecat4;
array row(17) _1--_72;
array tot(17);
do i=1 to 17;
if missing(row(i)) then row(i)=0;
end;
if first.startyear then do i=1 to 17;
tot(i)=row(i);
end;
else if DataElement eq 'want1' then do i=1 to 17;
tot(i)+row(i);
end;
output;
if DataElement eq 'want1' then do;
DataElement='want3';
do i=2 to 17;
row(i)+row(i-1);
end;
output;
DataElement='want4';
do i=1 to 17;
row(i)=100*row(i)/total;
end;
output;
end;
if last.startyear then do;
DataElement='want5';
do i=1 to 17;
row(i)=tot(i);
end;
call missing(exit);
call missing(agecat4);
output;
end;
run;
options validvarname=any;
data want;
set want (rename=(
_1='1 Month'n
_2='2 Months'n
_3='3 Months'n
_4='4 Months'n
_5='5 Months'n
_6='6 Months'n
_12='7-12 Months'n
_18='1-1.5 Years'n
_24='>1.51-1.99 Years'n
_30='2-2.5 Years'n
_36='2.51-2.99 Years'n
_42='3-3.5 Years'n
_48='3.51-3.99 Years'n
_54='4-4.5 Years'n
_60='4.51-4.99 Years'n
_66='5-5.5 Years'n
_72='>5.5 Years'n));
run;
I don't understand your requirements fully, but think that proc tabulate might be a better option. Capture the table using ODS and then run through that summary to do your cumulative totals.
You've posted the data but I don't see a summary of what you want your solution to look like so just a guess on my part.
data s1ageshort;
set s1ageshort;
var=1;
run;
ods table table=summary1;
proc tabulate data=s1ageshort;
class cnty_name startyear agecat4 exit exitMonthCategory;
var var;
table cnty_name*startyear*agecat4*exitMonthCategory *(n='Number' var*colpctn), exit;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.