Help using Base SAS procedures

Proc Freq: Percents

Accepted Solution Solved
Reply
Regular Contributor
Posts: 216
Accepted Solution

Proc Freq: Percents

Hi

I have the below proc freq statement and partial result. It is only returning count and percent. How do I include things like row percent, column percent, cumulative percent?

Paul

proc freq data=s1AgeRanges noprint;

  tables cnty_name*startyear*agecat4*exit*exitMonthCategory/out=s1age1;

  run;

Albany    2006    1    XCA    6    1    0.0015617679

Albany    2006    1    XCA    12    1    0.0015617679

Albany    2006    1    XCA    24    3    0.0046853038

Albany    2006    1    XCA    30    3    0.0046853038

.........


Accepted Solutions
Solution
‎11-13-2012 04:03 PM
PROC Star
Posts: 7,363

Re: Proc Freq: Percents

Paul,  I hadn't responded until now because (1) I had my real job to do and (2) I'm not sure if I understand your requirements.

However, that said, I also misunderstood what proc freq was using as the denominator for table percents, thus had to revise the code a bit.

You can easily amass the desired denominator by including a datastep with a double dow loop in it.  Actually, the entire code could be wrapped within the same datastep but, honestly, I thought that would be too difficult to follow.

Regardless, you said that the code wasn't correctly producing want3.  You had defined want3 as:

3. For each county, start year, exit and agecat,  the cumulative number of kids who have are in a given exitMonthCateogory. This is just adding the numbers in each exitMonthCateogory across.

That is what want 3 is producing.  Here are two lines from the results, the first showing want1 (the counts) and the second showing want3 (the cummulative counts across the row):

cnty_namestartyearexitagecat4DataElement1 Month2 Months3 Months4 Months5 Months6 Months7-12 Months1-1.5 Years>1.51-1.99 Years2-2.5 Years2.51-2.99 Years3-3.5 Years3.51-3.99 Years4-4.5 Years4.51-4.99 Years5-5.5 Years>5.5 Years
County-CLPR2006XCA1want100000012466582000
County-CLPR2006XCA1want300000013713192432

34

343434

If the above is not what you are seeking, then I don't understand the requirements.  On the other hand, if it is what you wanted, then try the following revised code:

options validvarname=v7;

PROC IMPORT OUT= WORK.s1ageshort

            DATAFILE= "D:\art\agerange-posting2.xlsx"

            DBMS=XLSX REPLACE;

     GETNAMES=YES;

     DATAROW=2;

RUN;

proc sort data=s1ageshort;

  by cnty_name startyear;

run;

proc freq data=s1ageshort noprint;

  tables exit*agecat4*exitMonthCategory/

    out=s1ageshorttest (drop=percent pct_col pct_tabl

        rename=(

            count=want1

               pct_row=want2

                        ))

    outpct sparse;

  by cnty_name startyear;

run;

proc transpose data=s1ageshorttest out=want (drop=_label_

            rename=(_name_=DataElement));

  by cnty_name startyear exit agecat4;

  id exitMonthCategory;

run;

data want;

  do until (last.startyear);

    set want;

    array row(17) _1--_72;

    by cnty_name startyear;

    if first.startyear then total=0;

    if DataElement='want1' then total=sum(total,sum(of row(*)));

  end;

  do until (last.startyear);

    set want;

    by cnty_name startyear;

    output;

  end;

run;

data want (drop=tot: i);

  set want;

  by cnty_name startyear exit agecat4;

  array row(17) _1--_72;

  array tot(17);

  do i=1 to 17;

    if missing(row(i)) then row(i)=0;

  end;

  if first.startyear then do i=1 to 17;

    tot(i)=row(i);

  end;

  else if DataElement eq 'want1' then do i=1 to 17;

    tot(i)+row(i);

  end;

  output;

  if DataElement eq 'want1' then do;

    DataElement='want3';

    do i=2 to 17;

      row(i)+row(i-1);

    end;

    output;

    DataElement='want4';

    do i=1 to 17;

      row(i)=100*row(i)/total;

    end;

    output;

  end;

  if last.startyear then do;

    DataElement='want5';

    do i=1 to 17;

      row(i)=tot(i);

    end;

    call missing(exit);

    call missing(agecat4);

    output;

  end;

run;

options validvarname=any;

data want;

  set want (rename=(

         _1='1 Month'n

         _2='2 Months'n

         _3='3 Months'n

         _4='4 Months'n

         _5='5 Months'n

         _6='6 Months'n

         _12='7-12 Months'n

         _18='1-1.5 Years'n

         _24='>1.51-1.99 Years'n

         _30='2-2.5 Years'n

         _36='2.51-2.99 Years'n

         _42='3-3.5 Years'n

         _48='3.51-3.99 Years'n

         _54='4-4.5 Years'n

         _60='4.51-4.99 Years'n

         _66='5-5.5 Years'n

         _72='>5.5 Years'n));

run;

View solution in original post


All Replies
Regular Contributor
Posts: 216

Re: Proc Freq: Percents

Hi Paul. May be you can consider using ods output instead.  HTH.

ods output CrossTabFreqs=freqs;

proc freq data=s1AgeRanges noprint;

tables cnty_name*startyear*agecat4*exit*exitMonthCategory;

run;

ods output close;

PROC Star
Posts: 7,363

Re: Proc Freq: Percents

Paul, you only have to add one thing to the out= part of the statement, namely:

proc freq data=have noprint;

  tables cnty_name*agecat4*exitMonthCategory/out=test outpct;

  format exitMonthCategory timespan.;

run;

Regular Contributor
Posts: 216

Re: Proc Freq: Percents

When I add both outpct and outcum I am only getting the three percents and not the cumulative percents.

Also, these percents on not based on the correct strata/disaggregation. I have 5 variables in the proc freq output with one being the one that is being counted. Is there a way to calculate all the percents based on how the data is stratified by the 4 variables?

Paul

PROC Star
Posts: 7,363

Re: Proc Freq: Percents

Paul: I didn't suggest adding outcum as that option is only relevant for one-way tables.

You will have to show an example indicating how you want the percents calculated.  The current output, using only the outpct option, includes the following calculations:

percent: which is the percent the cell comprises of all of the data

pct_tabl which is  the percent the cell comprises of the data for a particular county

pct_row which is  the percent the cell comprises of the row in which it is contained

pct_col which is the percent the cell comprises of a particular column for a particular county

If you need others, e.g. cumulative, they could probably be easily accomplished, from those percents and counts, within a fairly simple datastep.

Regular Contributor
Posts: 216

Re: Proc Freq: Percents

Below is a link to a sorted version of the same sample data file I posted earlier. This one is sorted by county->startyear->exit->agecat. Also is indicated in shading what a denominator is for all these calculations: the combination of county/start year. Based on using this denominator, I am trying to calculate the following statistics using the freq table you found in the other posting:

http://www.nycourts.gov/surveys/cwcip/agerange-posting2-den.zip

1. For each county, start year, exit and agecat, the number of kids who have are in a given exitMonthCateogory (this is already done based on existing freq calculation).

2. For each county, start year, exit and agecat, the percent of kids who have are in a given exitMonthCateogory. The denominator for this is the total number who are in a particular startyear/county combination. The current percent calculation in the freq. calculation is not using the correct denominator.

3. For each county, start year, exit and agecat,  the cumulative number of kids who have are in a given exitMonthCateogory. This is just adding the numbers in each exitMonthCateogory across.

4. For each county, start year, exit and agecat,  the cumulative percent of kids who have are in a given exitMonthCateogory. The denominator for this is the total number who are in a particular startyear/county combination.

5. The total number of kids in each startyear/county combination.

I was thinking that each start/county needs to be a separate table in the freq calculation, but that would create a lot of tables since there are 21 counties and 6 startyears: 126 tables.

If there are any thoughts on how to determine the correct denominator for these stratafications, please let me know.

Paul

PROC Star
Posts: 7,363

Re: Proc Freq: Percents

Paul: There is probably an easier way to do this, and you will have to check to see if I got all of your requirements correctly, but the following should come at least pretty close to doing what you want:

options validvarname=v7;

PROC IMPORT OUT= WORK.s1ageshort

            DATAFILE= "D:\art\agerange-posting2.xlsx"

            DBMS=XLSX REPLACE;

     GETNAMES=YES;

     DATAROW=2;

RUN;

proc sort data=s1ageshort;

  by cnty_name startyear;

run;

proc freq data=s1ageshort noprint;

  tables exit*agecat4*exitMonthCategory/

    out=s1ageshorttest (drop=percent pct_col

        rename=(

            count=want1

               pct_row=want2

               pct_tabl=want4

                        ))

    outpct sparse;

  by cnty_name startyear;

run;

proc transpose data=s1ageshorttest out=want (drop=_label_

            rename=(_name_=DataElement));

  by cnty_name startyear exit agecat4;

  id exitMonthCategory;

run;

data want (drop=totSmiley Happy;

  set want;

  by cnty_name startyear exit agecat4;

  array row(17) _1--_72;

  array tot(17);

  if first.startyear then do i=1 to 17;

    tot(i)=row(i);

  end;

  else if DataElement eq 'want1' then do i=1 to 17;

    tot(i)+row(i);

  end;

  output;

  if DataElement eq 'want1' then do;

    DataElement='want3';

    do i=2 to 17;

      row(i)+row(i-1);

    end;

    output;

  end;

  if last.startyear then do;

    DataElement='want5';

    do i=1 to 17;

      row(i)=tot(i);

    end;

    call missing(exit);

    call missing(agecat4);

    output;

  end;

run;

options validvarname=any;

data want;

  set want (rename=(

         _1='1 Month'n

         _2='2 Months'n

         _3='3 Months'n

         _4='4 Months'n

         _5='5 Months'n

         _6='6 Months'n

         _12='7-12 Months'n

         _18='1-1.5 Years'n

         _24='>1.51-1.99 Years'n

         _30='2-2.5 Years'n

         _36='2.51-2.99 Years'n

         _42='3-3.5 Years'n

         _48='3.51-3.99 Years'n

         _54='4-4.5 Years'n

         _60='4.51-4.99 Years'n

         _66='5-5.5 Years'n

         _72='>5.5 Years'n));

run;

options validvarname=v7;

Regular Contributor
Posts: 216

Re: Proc Freq: Percents

Hi Art

Thanks a lot, I am going through this today. I am also posting a few other questions on data step processing.

Paul

Regular Contributor
Posts: 216

Re: Proc Freq: Percents

Hi Art

I ran the above code, while it does return percents, I don't think they are correct apparently. They are similar to running proc freq actually.

By using the below code (following using your original code), I was able to include the freq number (#1 above), the percent (#2 above), and the startyear/county combination number (#5 above) in one data set (http://www.nycourts.gov/surveys/cwcip/agerange1-11-13-12.zip). The problem is that I cannot get the cumulative number (#3 above) and cumulative percent (#4 above).

Do you or anyone know how to start at the top of a data file and cumulatively add the values of a column from 1 record to the next record and populate the cumulative number in a new column for each record? This calculation needs to be done situationally--when the previous and current record have the same county, year, exit values.

Paul

proc freq data=s1ageshort;

tables cnty_name*startyear/out=s1ageshortyearfreq /*this is the total per cnty_name,startyear combo*/;

run;

proc sort data=s1ageshortyearfreq out=s1ageshortyearfreqsort;

by cnty_name startyear;

run;

proc sort data=s1ageshorttest out=s1ageshorttestsort;

by cnty_name startyear;

run;

data s1ageyear1;

merge work.s1ageshortyearfreqsort (keep=count cnty_name startyear rename=(count=cohortyearcount)) work.s1ageshorttestsort (drop=percent);

by cnty_name startyear;

Percent1=count/cohortyearcount /*this is the percent*/;

run;

Solution
‎11-13-2012 04:03 PM
PROC Star
Posts: 7,363

Re: Proc Freq: Percents

Paul,  I hadn't responded until now because (1) I had my real job to do and (2) I'm not sure if I understand your requirements.

However, that said, I also misunderstood what proc freq was using as the denominator for table percents, thus had to revise the code a bit.

You can easily amass the desired denominator by including a datastep with a double dow loop in it.  Actually, the entire code could be wrapped within the same datastep but, honestly, I thought that would be too difficult to follow.

Regardless, you said that the code wasn't correctly producing want3.  You had defined want3 as:

3. For each county, start year, exit and agecat,  the cumulative number of kids who have are in a given exitMonthCateogory. This is just adding the numbers in each exitMonthCateogory across.

That is what want 3 is producing.  Here are two lines from the results, the first showing want1 (the counts) and the second showing want3 (the cummulative counts across the row):

cnty_namestartyearexitagecat4DataElement1 Month2 Months3 Months4 Months5 Months6 Months7-12 Months1-1.5 Years>1.51-1.99 Years2-2.5 Years2.51-2.99 Years3-3.5 Years3.51-3.99 Years4-4.5 Years4.51-4.99 Years5-5.5 Years>5.5 Years
County-CLPR2006XCA1want100000012466582000
County-CLPR2006XCA1want300000013713192432

34

343434

If the above is not what you are seeking, then I don't understand the requirements.  On the other hand, if it is what you wanted, then try the following revised code:

options validvarname=v7;

PROC IMPORT OUT= WORK.s1ageshort

            DATAFILE= "D:\art\agerange-posting2.xlsx"

            DBMS=XLSX REPLACE;

     GETNAMES=YES;

     DATAROW=2;

RUN;

proc sort data=s1ageshort;

  by cnty_name startyear;

run;

proc freq data=s1ageshort noprint;

  tables exit*agecat4*exitMonthCategory/

    out=s1ageshorttest (drop=percent pct_col pct_tabl

        rename=(

            count=want1

               pct_row=want2

                        ))

    outpct sparse;

  by cnty_name startyear;

run;

proc transpose data=s1ageshorttest out=want (drop=_label_

            rename=(_name_=DataElement));

  by cnty_name startyear exit agecat4;

  id exitMonthCategory;

run;

data want;

  do until (last.startyear);

    set want;

    array row(17) _1--_72;

    by cnty_name startyear;

    if first.startyear then total=0;

    if DataElement='want1' then total=sum(total,sum(of row(*)));

  end;

  do until (last.startyear);

    set want;

    by cnty_name startyear;

    output;

  end;

run;

data want (drop=tot: i);

  set want;

  by cnty_name startyear exit agecat4;

  array row(17) _1--_72;

  array tot(17);

  do i=1 to 17;

    if missing(row(i)) then row(i)=0;

  end;

  if first.startyear then do i=1 to 17;

    tot(i)=row(i);

  end;

  else if DataElement eq 'want1' then do i=1 to 17;

    tot(i)+row(i);

  end;

  output;

  if DataElement eq 'want1' then do;

    DataElement='want3';

    do i=2 to 17;

      row(i)+row(i-1);

    end;

    output;

    DataElement='want4';

    do i=1 to 17;

      row(i)=100*row(i)/total;

    end;

    output;

  end;

  if last.startyear then do;

    DataElement='want5';

    do i=1 to 17;

      row(i)=tot(i);

    end;

    call missing(exit);

    call missing(agecat4);

    output;

  end;

run;

options validvarname=any;

data want;

  set want (rename=(

         _1='1 Month'n

         _2='2 Months'n

         _3='3 Months'n

         _4='4 Months'n

         _5='5 Months'n

         _6='6 Months'n

         _12='7-12 Months'n

         _18='1-1.5 Years'n

         _24='>1.51-1.99 Years'n

         _30='2-2.5 Years'n

         _36='2.51-2.99 Years'n

         _42='3-3.5 Years'n

         _48='3.51-3.99 Years'n

         _54='4-4.5 Years'n

         _60='4.51-4.99 Years'n

         _66='5-5.5 Years'n

         _72='>5.5 Years'n));

run;

Super User
Posts: 17,819

Re: Proc Freq: Percents

I don't understand your requirements fully, but think that proc tabulate might be a better option. Capture the table using ODS and then run through that summary to do your cumulative totals.

You've posted the data but I don't see a summary of what you want your solution to look like so just a guess on my part.

data s1ageshort;

    set s1ageshort;

    var=1;

run;

ods table table=summary1;

proc tabulate data=s1ageshort;

    class cnty_name startyear agecat4 exit exitMonthCategory;

    var var;

    table cnty_name*startyear*agecat4*exitMonthCategory *(n='Number' var*colpctn), exit;

run;

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 10 replies
  • 497 views
  • 0 likes
  • 4 in conversation