BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Paul_NYS
Obsidian | Level 7

Hi

I am looking for a way to summarize the data in a data set. Is there a way to create another data set that summarizes records in the data set from another? In other words, I have a data set that contains the months it takes to achieve certain events and these are stored as columns, along with a county field and a unique record indicator. I would like to group the times it takes to achieve these events by month groupings (i.e. 0-6, 7-12, 13-18, etc.) and then summarize these by sums, percentages by county.

Is there a way to do this?

Paul

1 ACCEPTED SOLUTION

Accepted Solutions
art297
Opal | Level 21

Paul, As I had mentioned, earlier, you will get better responses if you provide an example dataset.

Without knowing what your data looks like, all any of us can do is guess.

Your output looks like a result that could easily be obtained using a combination of proc freq and proc transpose.  e.g., the following comes close:

proc format;

  value timespan

    low-0.084931=1

    0.0849310001-0.167123=2

    0.1671230001-0.252054=3

    other=4;

run;

data have;

  informat cnty_name $20.;

  informat description $30.;

  informat start_date date9.;

  format start_date date9.;

  input id cnty_name description age_range year start_date;

  time_in_program=yrdif(start_date, today(), 'AGE');

  cards;

1 Hopeland Outcome-2940 1 2006 8jan2012

2 Hopeland Outcome-2940 1 2006 8feb2012

3 Hopeland Outcome-2940 2 2006 8aug2012

4 Hopeland Outcome-2940 2 2006 8sep2012

5 Hopeland Outcome-2940 2 2006 8oct2012

1 Nohopeland Outcome-2940 1 2006 8jan2012

2 Nohopeland Outcome-2940 1 2006 8feb2012

3 Nohopeland Outcome-2940 3 2006 8aug2012

4 Nohopeland Outcome-2940 3 2006 8sep2012

5 Nohopeland Outcome-2940 3 2006 8oct2012

;

proc freq data=have noprint;

  tables cnty_name*age_range*time_in_program/out=test;

  format time_in_program timespan.;

run;

proc transpose data=test out=want (drop=_label_

            rename=(_name_=DataElement))prefix=months;

  by cnty_name age_range;

  idlabel time_in_program;

run;

View solution in original post

11 REPLIES 11
art297
Opal | Level 21

Paul,  sounds simple enough but it will help if you provide example have and want datasets, preferably in the form of datasteps.

Paul_NYS
Obsidian | Level 7

For this, I don't have much of a data step to do it actually. I am not even sure it can be done. The existing data table (s1) has the following columns (among many others, but the below are key):

entity_id (unique id)

age_category (1-4 indicator of the age range)

county (county)

ExitMonths (the time in months it takes to reach the court milestone)

ExitMonthCategory (either a 6, 12, 18, 24 indicator)

startYear (year child entered the court system)

What I would like to do is to export a new data set (ultimately to Excel) creating columns for month ranges (i.e. 0-6, 7-12, 13-18, etc.) and rows for summarized measures and dimensions (i.e. county, age range, number, percent, cumulative percent). The row dimensions would summarize the above ExitMonths column from the s1 data table.

I tried using Enterprise Guide to run a frequency over the s1 table, but could not get what I want.

Any ideas?

Paul

ballardw
Super User

What type of summary are you looking for, counts, percents, means or something else?

You may want to manually create and post an example of the final output desired appearance.

It may be that Proc Tabulate or Report would work as they allow many summarization options. The output can be set to Excel using ODS TAGSETS.EXCELXP and even result in semi-pretty tables.

Paul_NYS
Obsidian | Level 7

I am looking for number, cumulative number, percent, cumulative percent and total. I am attaching a link to an example of what the ultimate output from SAS would look like in Excel.

So I am I have a data table that has many child records per county and I need to summarize these records by county, and then by things like age range, year--all the dimensions that are in the link to the attached sample file.

I don't know how to summarize a data table though. I can do some things in the Enterprise Guide GUI with Procs, but I don't get everything and it is not as neatly organized as the linked example.

Any ideas on approach, etc. would be appreciated.

http://www.nycourts.gov/surveys/cwcip/metric1-Example.zip

Paul

art297
Opal | Level 21

Paul, As I had mentioned, earlier, you will get better responses if you provide an example dataset.

Without knowing what your data looks like, all any of us can do is guess.

Your output looks like a result that could easily be obtained using a combination of proc freq and proc transpose.  e.g., the following comes close:

proc format;

  value timespan

    low-0.084931=1

    0.0849310001-0.167123=2

    0.1671230001-0.252054=3

    other=4;

run;

data have;

  informat cnty_name $20.;

  informat description $30.;

  informat start_date date9.;

  format start_date date9.;

  input id cnty_name description age_range year start_date;

  time_in_program=yrdif(start_date, today(), 'AGE');

  cards;

1 Hopeland Outcome-2940 1 2006 8jan2012

2 Hopeland Outcome-2940 1 2006 8feb2012

3 Hopeland Outcome-2940 2 2006 8aug2012

4 Hopeland Outcome-2940 2 2006 8sep2012

5 Hopeland Outcome-2940 2 2006 8oct2012

1 Nohopeland Outcome-2940 1 2006 8jan2012

2 Nohopeland Outcome-2940 1 2006 8feb2012

3 Nohopeland Outcome-2940 3 2006 8aug2012

4 Nohopeland Outcome-2940 3 2006 8sep2012

5 Nohopeland Outcome-2940 3 2006 8oct2012

;

proc freq data=have noprint;

  tables cnty_name*age_range*time_in_program/out=test;

  format time_in_program timespan.;

run;

proc transpose data=test out=want (drop=_label_

            rename=(_name_=DataElement))prefix=months;

  by cnty_name age_range;

  idlabel time_in_program;

run;

Paul_NYS
Obsidian | Level 7

Hi Arthur

Two questions among many, but two to start:

1. How are you coming up with the month ranges along the top of the spreadsheet? From the code, I am not seeing how this is determined.

2. The first proc format does what?

Paul

art297
Opal | Level 21

Your second question answers your first question.  The proc format converts the age calculation into months.  I only did it for months 1, 2 and 3, and formatted all other "ages" to be 4, as I didn't know if this approach was what you were looking for or if your data was just dates like those I used as a sample dataset.

I calculated time in program with the statement: time_in_program=yrdif(start_date, today(), 'AGE');

Then, applied the format when I ran proc freq with the statement: format time_in_program timespan.;

Paul_NYS
Obsidian | Level 7

My data ranges (at top of the spreadsheet) are just individual months or 6 month groupings. How do they correspond to the below values (i.e. -0.084931)?

proc format;

  value timespan

    low-0.084931=1

    0.0849310001-0.167123=2

    0.1671230001-0.252054=3

    other=4;

run;

art297
Opal | Level 21

Depends on your actual desired cutoffs.  I was approximating 1/12th of a year.  Actually, it should have been 0.08333 for one month, 0.166667 for two months, 0.25 for 3 months, etc.

Paul_NYS
Obsidian | Level 7

1. what does 'low' refer to in proc format?

2. How are you automatically classifying the values in time in program as belonging to one of the month categories?

Paul

art297
Opal | Level 21

Paul, Low means the lowest value that SAS can represent (excluding missing values).

You didn't provide any example data, thus I built my example on the premise that you had a start date, which I called start_date.

I then calculated time_in_program as yrdif(start_date, today(), 'AGE');  What that calculated is the difference between start_date and today(), expressed as a real number, similar to how one would calculate someone's age.  If the difference was precisely one month, the resulting number would be 0.08333If the difference was precisely one year, the resulting number would be 1.0.


I created and applied a format to classify those calculations into time groups.


SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 11 replies
  • 2079 views
  • 0 likes
  • 3 in conversation