Solved: Re: Proc Freq: Percents

Paul_NYS · Posted 11-11-2012 04:27 PM

Hi

I have the below proc freq statement and partial result. It is only returning count and percent. How do I include things like row percent, column percent, cumulative percent?

Paul

proc freq data=s1AgeRanges noprint;

tables cnty_name*startyear*agecat4*exit*exitMonthCategory/out=s1age1;

run;

Albany 2006 1 XCA 6 1 0.0015617679

Albany 2006 1 XCA 12 1 0.0015617679

Albany 2006 1 XCA 24 3 0.0046853038

Albany 2006 1 XCA 30 3 0.0046853038

.........

art297 · Posted 11-13-2012 04:03 PM

Paul, I hadn't responded until now because (1) I had my real job to do and (2) I'm not sure if I understand your requirements.

However, that said, I also misunderstood what proc freq was using as the denominator for table percents, thus had to revise the code a bit.

You can easily amass the desired denominator by including a datastep with a double dow loop in it. Actually, the entire code could be wrapped within the same datastep but, honestly, I thought that would be too difficult to follow.

Regardless, you said that the code wasn't correctly producing want3. You had defined want3 as:

3. For each county, start year, exit and agecat, the cumulative number of kids who have are in a given exitMonthCateogory. This is just adding the numbers in each exitMonthCateogory across.

That is what want 3 is producing. Here are two lines from the results, the first showing want1 (the counts) and the second showing want3 (the cummulative counts across the row):

cnty_name	startyear	exit	agecat4	DataElement	1 Month	2 Months	3 Months	4 Months	5 Months	6 Months	7-12 Months	1-1.5 Years	>1.51-1.99 Years	2-2.5 Years	2.51-2.99 Years	3-3.5 Years	3.51-3.99 Years	4-4.5 Years	4.51-4.99 Years	5-5.5 Years	>5.5 Years
County-CLPR	2006	XCA	1	want1	0	0	0	0	0	0	1	2	4	6	6	5	8	2	0	0	0
County-CLPR	2006	XCA	1	want3	0	0	0	0	0	0	1	3	7	13	19	24	32	34	34	34	34

If the above is not what you are seeking, then I don't understand the requirements. On the other hand, if it is what you wanted, then try the following revised code:

options validvarname=v7;

PROC IMPORT OUT= WORK.s1ageshort

DATAFILE= "D:\art\agerange-posting2.xlsx"

DBMS=XLSX REPLACE;

GETNAMES=YES;

DATAROW=2;

RUN;

proc sort data=s1ageshort;

by cnty_name startyear;

run;

proc freq data=s1ageshort noprint;

tables exit*agecat4*exitMonthCategory/

out=s1ageshorttest (drop=percent pct_col pct_tabl

rename=(

count=want1

pct_row=want2

))

outpct sparse;

by cnty_name startyear;

run;

proc transpose data=s1ageshorttest out=want (drop=_label_

rename=(_name_=DataElement));

by cnty_name startyear exit agecat4;

id exitMonthCategory;

run;

data want;

do until (last.startyear);

set want;

array row(17) _1--_72;

by cnty_name startyear;

if first.startyear then total=0;

if DataElement='want1' then total=sum(total,sum(of row(*)));

end;

do until (last.startyear);

set want;

by cnty_name startyear;

output;

end;

run;

data want (drop=tot: i);

set want;

by cnty_name startyear exit agecat4;

array row(17) _1--_72;

array tot(17);

do i=1 to 17;

if missing(row(i)) then row(i)=0;

end;

if first.startyear then do i=1 to 17;

tot(i)=row(i);

end;

else if DataElement eq 'want1' then do i=1 to 17;

tot(i)+row(i);

end;

output;

if DataElement eq 'want1' then do;

DataElement='want3';

do i=2 to 17;

row(i)+row(i-1);

end;

output;

DataElement='want4';

do i=1 to 17;

row(i)=100*row(i)/total;

end;

output;

end;

if last.startyear then do;

DataElement='want5';

do i=1 to 17;

row(i)=tot(i);

end;

call missing(exit);

call missing(agecat4);

output;

end;

run;

options validvarname=any;

data want;

set want (rename=(

_1='1 Month'n

_2='2 Months'n

_3='3 Months'n

_4='4 Months'n

_5='5 Months'n

_6='6 Months'n

_12='7-12 Months'n

_18='1-1.5 Years'n

_24='>1.51-1.99 Years'n

_30='2-2.5 Years'n

_36='2.51-2.99 Years'n

_42='3-3.5 Years'n

_48='3.51-3.99 Years'n

_54='4-4.5 Years'n

_60='4.51-4.99 Years'n

_66='5-5.5 Years'n

_72='>5.5 Years'n));

run;

View solution in original post

Miracle · Posted 11-11-2012 05:38 PM

Hi Paul. May be you can consider using ods output instead. HTH.

ods output CrossTabFreqs=freqs;

proc freq data=s1AgeRanges noprint;

tables cnty_name*startyear*agecat4*exit*exitMonthCategory;

run;

ods output close;

art297 · Posted 11-11-2012 05:54 PM

Paul, you only have to add one thing to the out= part of the statement, namely:

proc freq data=have noprint;

tables cnty_name*agecat4*exitMonthCategory/out=test outpct;

format exitMonthCategory timespan.;

run;

Paul_NYS · Posted 11-12-2012 09:40 AM

When I add both outpct and outcum I am only getting the three percents and not the cumulative percents.

Also, these percents on not based on the correct strata/disaggregation. I have 5 variables in the proc freq output with one being the one that is being counted. Is there a way to calculate all the percents based on how the data is stratified by the 4 variables?

Paul

art297 · Posted 11-12-2012 10:41 AM

Paul: I didn't suggest adding outcum as that option is only relevant for one-way tables.

You will have to show an example indicating how you want the percents calculated. The current output, using only the outpct option, includes the following calculations:

percent: which is the percent the cell comprises of all of the data

pct_tabl which is the percent the cell comprises of the data for a particular county

pct_row which is the percent the cell comprises of the row in which it is contained

pct_col which is the percent the cell comprises of a particular column for a particular county

If you need others, e.g. cumulative, they could probably be easily accomplished, from those percents and counts, within a fairly simple datastep.

Paul_NYS · Posted 11-12-2012 12:48 PM

Below is a link to a sorted version of the same sample data file I posted earlier. This one is sorted by county->startyear->exit->agecat. Also is indicated in shading what a denominator is for all these calculations: the combination of county/start year. Based on using this denominator, I am trying to calculate the following statistics using the freq table you found in the other posting:

http://www.nycourts.gov/surveys/cwcip/agerange-posting2-den.zip

1. For each county, start year, exit and agecat, the number of kids who have are in a given exitMonthCateogory (this is already done based on existing freq calculation).

2. For each county, start year, exit and agecat, the percent of kids who have are in a given exitMonthCateogory. The denominator for this is the total number who are in a particular startyear/county combination. The current percent calculation in the freq. calculation is not using the correct denominator.

3. For each county, start year, exit and agecat, the cumulative number of kids who have are in a given exitMonthCateogory. This is just adding the numbers in each exitMonthCateogory across.

4. For each county, start year, exit and agecat, the cumulative percent of kids who have are in a given exitMonthCateogory. The denominator for this is the total number who are in a particular startyear/county combination.

5. The total number of kids in each startyear/county combination.

I was thinking that each start/county needs to be a separate table in the freq calculation, but that would create a lot of tables since there are 21 counties and 6 startyears: 126 tables.

If there are any thoughts on how to determine the correct denominator for these stratafications, please let me know.

Paul

art297 · Posted 11-12-2012 03:43 PM

Paul: There is probably an easier way to do this, and you will have to check to see if I got all of your requirements correctly, but the following should come at least pretty close to doing what you want:

options validvarname=v7;

PROC IMPORT OUT= WORK.s1ageshort

DATAFILE= "D:\art\agerange-posting2.xlsx"

DBMS=XLSX REPLACE;

GETNAMES=YES;

DATAROW=2;

RUN;

proc sort data=s1ageshort;

by cnty_name startyear;

run;

proc freq data=s1ageshort noprint;

tables exit*agecat4*exitMonthCategory/

out=s1ageshorttest (drop=percent pct_col

rename=(

count=want1

pct_row=want2

pct_tabl=want4

))

outpct sparse;

by cnty_name startyear;

run;

proc transpose data=s1ageshorttest out=want (drop=_label_

rename=(_name_=DataElement));

by cnty_name startyear exit agecat4;

id exitMonthCategory;

run;

data want (drop=tot:);

set want;

by cnty_name startyear exit agecat4;

array row(17) _1--_72;

array tot(17);

if first.startyear then do i=1 to 17;

tot(i)=row(i);

end;

else if DataElement eq 'want1' then do i=1 to 17;

tot(i)+row(i);

end;

output;

if DataElement eq 'want1' then do;

DataElement='want3';

do i=2 to 17;

row(i)+row(i-1);

end;

output;

end;

if last.startyear then do;

DataElement='want5';

do i=1 to 17;

row(i)=tot(i);

end;

call missing(exit);

call missing(agecat4);

output;

end;

run;

options validvarname=any;

data want;

set want (rename=(

_1='1 Month'n

_2='2 Months'n

_3='3 Months'n

_4='4 Months'n

_5='5 Months'n

_6='6 Months'n

_12='7-12 Months'n

_18='1-1.5 Years'n

_24='>1.51-1.99 Years'n

_30='2-2.5 Years'n

_36='2.51-2.99 Years'n

_42='3-3.5 Years'n

_48='3.51-3.99 Years'n

_54='4-4.5 Years'n

_60='4.51-4.99 Years'n

_66='5-5.5 Years'n

_72='>5.5 Years'n));

run;

options validvarname=v7;

Paul_NYS · Posted 11-13-2012 09:30 AM

Hi Art

Thanks a lot, I am going through this today. I am also posting a few other questions on data step processing.

Paul

Paul_NYS · Posted 11-13-2012 11:53 AM

Hi Art

I ran the above code, while it does return percents, I don't think they are correct apparently. They are similar to running proc freq actually.

By using the below code (following using your original code), I was able to include the freq number (#1 above), the percent (#2 above), and the startyear/county combination number (#5 above) in one data set (http://www.nycourts.gov/surveys/cwcip/agerange1-11-13-12.zip). The problem is that I cannot get the cumulative number (#3 above) and cumulative percent (#4 above).

Do you or anyone know how to start at the top of a data file and cumulatively add the values of a column from 1 record to the next record and populate the cumulative number in a new column for each record? This calculation needs to be done situationally--when the previous and current record have the same county, year, exit values.

Paul

proc freq data=s1ageshort;

tables cnty_name*startyear/out=s1ageshortyearfreq /*this is the total per cnty_name,startyear combo*/;

run;

proc sort data=s1ageshortyearfreq out=s1ageshortyearfreqsort;

by cnty_name startyear;

run;

proc sort data=s1ageshorttest out=s1ageshorttestsort;

by cnty_name startyear;

run;

data s1ageyear1;

merge work.s1ageshortyearfreqsort (keep=count cnty_name startyear rename=(count=cohortyearcount)) work.s1ageshorttestsort (drop=percent);

by cnty_name startyear;

Percent1=count/cohortyearcount /*this is the percent*/;

run;

art297 · Posted 11-13-2012 04:03 PM

Paul, I hadn't responded until now because (1) I had my real job to do and (2) I'm not sure if I understand your requirements.

However, that said, I also misunderstood what proc freq was using as the denominator for table percents, thus had to revise the code a bit.

You can easily amass the desired denominator by including a datastep with a double dow loop in it. Actually, the entire code could be wrapped within the same datastep but, honestly, I thought that would be too difficult to follow.

Regardless, you said that the code wasn't correctly producing want3. You had defined want3 as:

3. For each county, start year, exit and agecat, the cumulative number of kids who have are in a given exitMonthCateogory. This is just adding the numbers in each exitMonthCateogory across.

That is what want 3 is producing. Here are two lines from the results, the first showing want1 (the counts) and the second showing want3 (the cummulative counts across the row):

cnty_name	startyear	exit	agecat4	DataElement	1 Month	2 Months	3 Months	4 Months	5 Months	6 Months	7-12 Months	1-1.5 Years	>1.51-1.99 Years	2-2.5 Years	2.51-2.99 Years	3-3.5 Years	3.51-3.99 Years	4-4.5 Years	4.51-4.99 Years	5-5.5 Years	>5.5 Years
County-CLPR	2006	XCA	1	want1	0	0	0	0	0	0	1	2	4	6	6	5	8	2	0	0	0
County-CLPR	2006	XCA	1	want3	0	0	0	0	0	0	1	3	7	13	19	24	32	34	34	34	34

If the above is not what you are seeking, then I don't understand the requirements. On the other hand, if it is what you wanted, then try the following revised code:

options validvarname=v7;

PROC IMPORT OUT= WORK.s1ageshort

DATAFILE= "D:\art\agerange-posting2.xlsx"

DBMS=XLSX REPLACE;

GETNAMES=YES;

DATAROW=2;

RUN;

proc sort data=s1ageshort;

by cnty_name startyear;

run;

proc freq data=s1ageshort noprint;

tables exit*agecat4*exitMonthCategory/

out=s1ageshorttest (drop=percent pct_col pct_tabl

rename=(

count=want1

pct_row=want2

))

outpct sparse;

by cnty_name startyear;

run;

proc transpose data=s1ageshorttest out=want (drop=_label_

rename=(_name_=DataElement));

by cnty_name startyear exit agecat4;

id exitMonthCategory;

run;

data want;

do until (last.startyear);

set want;

array row(17) _1--_72;

by cnty_name startyear;

if first.startyear then total=0;

if DataElement='want1' then total=sum(total,sum(of row(*)));

end;

do until (last.startyear);

set want;

by cnty_name startyear;

output;

end;

run;

data want (drop=tot: i);

set want;

by cnty_name startyear exit agecat4;

array row(17) _1--_72;

array tot(17);

do i=1 to 17;

if missing(row(i)) then row(i)=0;

end;

if first.startyear then do i=1 to 17;

tot(i)=row(i);

end;

else if DataElement eq 'want1' then do i=1 to 17;

tot(i)+row(i);

end;

output;

if DataElement eq 'want1' then do;

DataElement='want3';

do i=2 to 17;

row(i)+row(i-1);

end;

output;

DataElement='want4';

do i=1 to 17;

row(i)=100*row(i)/total;

end;

output;

end;

if last.startyear then do;

DataElement='want5';

do i=1 to 17;

row(i)=tot(i);

end;

call missing(exit);

call missing(agecat4);

output;

end;

run;

options validvarname=any;

data want;

set want (rename=(

_1='1 Month'n

_2='2 Months'n

_3='3 Months'n

_4='4 Months'n

_5='5 Months'n

_6='6 Months'n

_12='7-12 Months'n

_18='1-1.5 Years'n

_24='>1.51-1.99 Years'n

_30='2-2.5 Years'n

_36='2.51-2.99 Years'n

_42='3-3.5 Years'n

_48='3.51-3.99 Years'n

_54='4-4.5 Years'n

_60='4.51-4.99 Years'n

_66='5-5.5 Years'n

_72='>5.5 Years'n));

run;

Reeza · Posted 11-13-2012 12:41 PM

I don't understand your requirements fully, but think that proc tabulate might be a better option. Capture the table using ODS and then run through that summary to do your cumulative totals.

You've posted the data but I don't see a summary of what you want your solution to look like so just a guess on my part.

data s1ageshort;

set s1ageshort;

var=1;

run;

ods table table=summary1;

proc tabulate data=s1ageshort;

class cnty_name startyear agecat4 exit exitMonthCategory;

var var;

table cnty_name*startyear*agecat4*exitMonthCategory *(n='Number' var*colpctn), exit;

run;

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away