DATA Step, Macro, Functions and more

Presenting datasets

Reply
Occasional Contributor
Posts: 5

Presenting datasets

COUNTY

1

Texas

Collin

2

Texas

Dallas

3

Georgia

DeKalb

AGE_DISTRIBUTION

1

100

20

40

60

2

10

10

40

50

3

45

100

56

67

AGE_DISTRIBUTION_DESC

AGE_0_10

< 10 years

AGE_10_20

Between 10 and 20 years

AGE_20_40

Between 20 and 40 years

AGE_40_PLUS

> 40 years

Requirement 1: Write a SAS program (with ODS) to use the datasets given above to generate HTML and MS EXCEL file that will look like the following (the files must have the 3 tables given below):

County Name

Age Distribution

Category

Total Number

Collin

< 10 years

100

Between 10 and 20 years

20

Between 20 and 40 years

40

> 40 years

60

Dallas

< 10 years

10

Between 10 and 20 years

10

Between 20 and 40 years

40

> 40 years

50

DeKalb

< 10 years

45

Between 10 and 20 years

100

Between 20 and 40 years

56

> 40 years

67

Category

County

Collin

Dallas

DeKalb

< 10 years

100

10

45

Between 10 and 20 years

20

10

100

Between 20 and 40 years

40

40

56

> 40 years

60

50

67

Total

220

110

268

Category

State

Texas

Georgia

< 10 years

110

45

Between 10 and 20 years

30

100

Between 20 and 40 years

80

56

> 40 years

110

67

Total

330

268

My solution:

data mypro.county;

input county_id state_name $10. county_name $10.;

cards;

1 Texas Collin

2 Texas Dallas

3 Georgia DeKalb

;

run;

data mypro.age_distribution;

input county_id  age1 age2 age3 age4;

cards;

1 100 20 40 60

2 10 10 40 50

3 45 100 56 67

;

run;

data mypro.county1;

merge mypro.county mypro.age_distribution;

by county_id;

run;

proc tabulate data=mypro.county1;

class county_name;

var age1 age2 age3 age4;

tables county_name,(age1 age2 age3 age4 all);

run;

The tabulate procedure is not giving the desired result?

Super User
Posts: 11,343

Re: Presenting datasets

Posted in reply to johnsaini

First, print your County data set. You will see that as written that the data was read incorrectly. Hint: Specifying a format on input statement forces the whole width of the format to be used when reading character data.

Second one way to get meaningful text instead of variable names is to associate a label with a variable.

     label county_name='County'; Label state_name='State'; label age1='< 10 years';

Another way is to create a single value that has a format to indicate the data source.

Third, All in tabulate as presented in your example is counting County_name and compared to the first table unneeded.

The first table spanning header isn't going to come from proc tabulate without extreme amounts of data manipulation.

I think you might look into Proc Report to generate these reports without completely changing the data structures you have.

Occasional Contributor
Posts: 5

Re: Presenting datasets

Hi,

Do I need to merge county and age distribuiton data sets?

Super User
Posts: 11,343

Re: Presenting datasets

Posted in reply to johnsaini

What you did before looks reasonable as to display state and county with age they will need to be in the same data set or else you get involved with a very complicated bit of code, not impossible just likely to be 10 times as much work with high complexity.

Note that starting proc report using syntax such as:

proc report data=anydatasetname; run;

starts a windowing environment that allows picking variables and there rolls as row or columns, statistics, summaries and such.

Ask a Question
Discussion stats
  • 3 replies
  • 242 views
  • 0 likes
  • 2 in conversation