turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Programming
- /
- SAS Procedures
- /
- Proc Univariate Histogram

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

08-23-2010 04:18 PM

Hi All,

I am trying to create a histogram of a continuous variable and to be able to see the data behind the histogram. If I understand correctly I need to use proc univariate with outhistogram= option. I have seen sas documentation where this produces a percent of obs within each bin and a count as well, however I seem to be unable to obtain the count. My code is as follows:

proc univariate data=exposure1988;

histogram experience / endpoints = 1 to 50. by 1

outhistogram=print ;

run;

Any help is apprecitated.

Jon

I am trying to create a histogram of a continuous variable and to be able to see the data behind the histogram. If I understand correctly I need to use proc univariate with outhistogram= option. I have seen sas documentation where this produces a percent of obs within each bin and a count as well, however I seem to be unable to obtain the count. My code is as follows:

proc univariate data=exposure1988;

histogram experience / endpoints = 1 to 50. by 1

outhistogram=print ;

run;

Any help is apprecitated.

Jon

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to deleted_user

08-23-2010 05:23 PM

Hi:

Well, did you do a PROC PRINT on the WORK.PRINT dataset or look in the SAS Explorer for the dataset?????

When I do this (note that I changed the name of the output dataset to avoid confusion with PROC PRINT being used to print the file):

[pre]

ods listing;

proc univariate data=sashelp.class;

histogram height / outhistogram=work.outhist ;

run;

proc print data=work.outhist;

title 'Histogram Data Output';

run;

[/pre]

Then, when I look in the LISTING window, I see the following results from the PROC PRINT:

[pre]

Histogram Data Output

Obs _VAR_ _MIDPT_ _OBSPCT_ _COUNT_

1 Height 52.5 5.2632 1

2 Height 57.5 31.5789 6

3 Height 62.5 31.5789 6

4 Height 67.5 26.3158 5

5 Height 72.5 5.2632 1

[/pre]

In my output the _OBSPCT_ is the percent and the _COUNT_ is the count. Other measurements can be requested (such as _CURVE_, _MAXPT_ and _MINPT_), as described here:

http://support.sas.com/documentation/cdl/en/procstat/63104/HTML/default/viewer.htm#/documentation/cd...

cynthia

Well, did you do a PROC PRINT on the WORK.PRINT dataset or look in the SAS Explorer for the dataset?????

When I do this (note that I changed the name of the output dataset to avoid confusion with PROC PRINT being used to print the file):

[pre]

ods listing;

proc univariate data=sashelp.class;

histogram height / outhistogram=work.outhist ;

run;

proc print data=work.outhist;

title 'Histogram Data Output';

run;

[/pre]

Then, when I look in the LISTING window, I see the following results from the PROC PRINT:

[pre]

Histogram Data Output

Obs _VAR_ _MIDPT_ _OBSPCT_ _COUNT_

1 Height 52.5 5.2632 1

2 Height 57.5 31.5789 6

3 Height 62.5 31.5789 6

4 Height 67.5 26.3158 5

5 Height 72.5 5.2632 1

[/pre]

In my output the _OBSPCT_ is the percent and the _COUNT_ is the count. Other measurements can be requested (such as _CURVE_, _MAXPT_ and _MINPT_), as described here:

http://support.sas.com/documentation/cdl/en/procstat/63104/HTML/default/viewer.htm#/documentation/cd...

cynthia

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to deleted_user

08-23-2010 05:39 PM

I can't seem to replicate this output Cynthia. I have now changed mine to :

ods listing;

proc univariate data=exposure1988;

histogram experience / endpoints = 1 to 50. by 1

outhistogram=work.exp ;

run;

proc print data=work.exp;

title 'histogram data output';

run;

I still get the same result, whether I am looking at the new dataset (exp) or at the listing.

Any other suggestions?

Thanks again,

Jon

ods listing;

proc univariate data=exposure1988;

histogram experience / endpoints = 1 to 50. by 1

outhistogram=work.exp ;

run;

proc print data=work.exp;

title 'histogram data output';

run;

I still get the same result, whether I am looking at the new dataset (exp) or at the listing.

Any other suggestions?

Thanks again,

Jon

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to deleted_user

08-23-2010 06:02 PM

Hi:

What happens if you take off the ENDPOINTS=???

In the doc it says:

http://support.sas.com/documentation/cdl/en/procstat/63104/HTML/default/viewer.htm#/documentation/cd...

And if you look at the description of how ENDPOINTS works, it may be that what you're specifying is incompatible with your data: (below quoted from doc)

ENDPOINTS <=values | KEY | UNIFORM>

uses histogram bin endpoints as the tick mark values for the horizontal axis and determines how to compute the bin width of the histogram bars. The values specify both the left and right endpoint of each histogram interval. The width of the histogram bars is the difference between consecutive endpoints. The procedure uses the same values for all variables.

The range of endpoints must cover the range of the data. For example, if you specify

endpoints=2 to 10 by 2

then all of the observations must fall in the intervals [2,4) [4,6) [6,8) [8,10]. You also must use evenly spaced endpoints which you list in increasing order.

KEY

determines the endpoints for the data in the key cell. The initial number of endpoints is based on the number of observations in the key cell by using the method of Terrell and Scott (1985). The procedure extends the endpoint list for the key cell in either direction as necessary until it spans the data in the remaining cells.

UNIFORM

determines the endpoints by using all the observations as if there were no cells. In other words, the number of endpoints is based on the total sample size by using the method of Terrell and Scott (1985).

Neither KEY nor UNIFORM apply unless you use the CLASS statement.

If you omit ENDPOINTS, the procedure uses the histogram midpoints as horizontal axis tick values. If you specify ENDPOINTS, the procedure computes the endpoints by using an algorithm (Terrell and Scott; 1985) that is primarily applicable to continuous data that are approximately normally distributed.

If you specify both MIDPOINTS= and ENDPOINTS, the procedure issues a warning message and uses the endpoints.

If you specify RTINCLUDE, the procedure includes the right endpoint of each histogram interval in that interval instead of including the left endpoint.

If you use a CLASS statement and specify ENDPOINTS, the procedure uses ENDPOINTS=KEY as the default. However if the key cell is empty, then the procedure uses ENDPOINTS=UNIFORM.

Are there any messages in the LOG??? If I understand you correctly, you DO get an output dataset (WORK.EXP) but the _COUNT_ variable is NOT in the dataset????

cynthia

What happens if you take off the ENDPOINTS=???

In the doc it says:

http://support.sas.com/documentation/cdl/en/procstat/63104/HTML/default/viewer.htm#/documentation/cd...

And if you look at the description of how ENDPOINTS works, it may be that what you're specifying is incompatible with your data: (below quoted from doc)

ENDPOINTS <=values | KEY | UNIFORM>

uses histogram bin endpoints as the tick mark values for the horizontal axis and determines how to compute the bin width of the histogram bars. The values specify both the left and right endpoint of each histogram interval. The width of the histogram bars is the difference between consecutive endpoints. The procedure uses the same values for all variables.

The range of endpoints must cover the range of the data. For example, if you specify

endpoints=2 to 10 by 2

then all of the observations must fall in the intervals [2,4) [4,6) [6,8) [8,10]. You also must use evenly spaced endpoints which you list in increasing order.

KEY

determines the endpoints for the data in the key cell. The initial number of endpoints is based on the number of observations in the key cell by using the method of Terrell and Scott (1985). The procedure extends the endpoint list for the key cell in either direction as necessary until it spans the data in the remaining cells.

UNIFORM

determines the endpoints by using all the observations as if there were no cells. In other words, the number of endpoints is based on the total sample size by using the method of Terrell and Scott (1985).

Neither KEY nor UNIFORM apply unless you use the CLASS statement.

If you omit ENDPOINTS, the procedure uses the histogram midpoints as horizontal axis tick values. If you specify ENDPOINTS, the procedure computes the endpoints by using an algorithm (Terrell and Scott; 1985) that is primarily applicable to continuous data that are approximately normally distributed.

If you specify both MIDPOINTS= and ENDPOINTS, the procedure issues a warning message and uses the endpoints.

If you specify RTINCLUDE, the procedure includes the right endpoint of each histogram interval in that interval instead of including the left endpoint.

If you use a CLASS statement and specify ENDPOINTS, the procedure uses ENDPOINTS=KEY as the default. However if the key cell is empty, then the procedure uses ENDPOINTS=UNIFORM.

Are there any messages in the LOG??? If I understand you correctly, you DO get an output dataset (WORK.EXP) but the _COUNT_ variable is NOT in the dataset????

cynthia

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to deleted_user

08-23-2010 06:20 PM

Correct, I do get the work.exp set created as well as the listing but only get the three variables and observations represented as a percentage rather than a count.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to deleted_user

08-24-2010 12:13 AM

Hi:

I'm stumped. I am running the following version of SAS:

[pre]

SAS Version is: 9.02.02M2P09012009

[/pre]

What version of SAS are you running? If you run -exactly- my code, using SASHELP.CLASS, do you get the same results as I posted??? To find out your version of SAS, submit the following statement:

[pre]

%put SAS Version is: &sysvlong4;

[/pre]

And then look in the SAS log for the results. If you run my code for SASHELP.CLASS and do NOT get the _COUNT_ variable; or, if you run my code and do get the _COUNT_ variable, but then switch to your data and do NOT get the _COUNT_ variable, your best resource is to open a track with Tech Support.

To open a track with Tech Support, fill out the form at this link:

http://support.sas.com/ctx/supportform/createForm

cynthia

I'm stumped. I am running the following version of SAS:

[pre]

SAS Version is: 9.02.02M2P09012009

[/pre]

What version of SAS are you running? If you run -exactly- my code, using SASHELP.CLASS, do you get the same results as I posted??? To find out your version of SAS, submit the following statement:

[pre]

%put SAS Version is: &sysvlong4;

[/pre]

And then look in the SAS log for the results. If you run my code for SASHELP.CLASS and do NOT get the _COUNT_ variable; or, if you run my code and do get the _COUNT_ variable, but then switch to your data and do NOT get the _COUNT_ variable, your best resource is to open a track with Tech Support.

To open a track with Tech Support, fill out the form at this link:

http://support.sas.com/ctx/supportform/createForm

cynthia