turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Data Management
- /
- Forum
- /
- Grouping by character variable

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

08-01-2014 05:28 AM

I have created the following dataset for reference :

data readin;

Input X$ Y;

cards;

D 1

B 2

A 2

A 3

A 3

B 3

A 3;

run;

I want to calculate count of X by Y variable. Since X is a character variable, the aggregation cannot be done using PROC MEANS and PROC TABULATE. I dont want to create X into numerical groups. I know this can be easily done via PROC SQL (Code shown below)

PROC SQL;

Create table readin1 as

Select Y, count(X) from

readin

group by Y;

quit;

Is it possible via data step or procedures other than PROC SQL?

Accepted Solutions

Solution

08-01-2014
03:37 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Ujjawal

08-01-2014 03:37 PM

where not missing(x);

class y;

tables y;

All Replies

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Ujjawal

08-01-2014 05:46 AM

Well, first question would be that SQL does the job pretty well, why so reluctant to not use it? I suppose you could do it by creating a format, and then putting your character into that format so the underlying data would be numeric, and displayed as A or B. Then means would work. You could also use rank procedure to derive a value for each one:

data readin;

Input X$ Y;

cards;

D 1

B 2

A 2

A 3

A 3

B 3

A 3

;

run;

data want;

set readin;

numversion=rank(x);

run;

Finally way I can think of is to retain the last value each time:

proc sort data=readin;

by y x;

run;

data want;

set readin;

retain count;

by Y X;

if first.X then count=1;

else count=count+1;

if last.X then output;

run;

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Ujjawal

08-01-2014 06:31 AM

You could use proc means without a variable and take the _freq_ output value.

proc sort data=readin;

by y x;

run;

proc means data=readin;

by y x;

output out=want;

run;

---------------------------------------------------------------------------------------------

Maxims of Maximally Efficient SAS Programmers

Maxims of Maximally Efficient SAS Programmers

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Ujjawal

08-01-2014 07:47 AM

Your PROC SQL is counting the non-missing values of X grouped by Y. That can also be done with PROC SUMMARY and a WHERE statement.

Input X$ Y;

cards;

D 1

B 2

A 2

A 3

A 3

. 3

B 3

A 3

;

Select Y, count(X) from readin

group by Y;

where not missing(x);

class y;

output out=freq;

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Ujjawal

08-01-2014 12:30 PM

I would be interested in seeing the PROC Tabulate that didn't work as I count character variables often using Tabulate.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Ujjawal

08-01-2014 03:19 PM

@ballardw : Thank you for your email. The following PROC TABULATE code doesn't work for me as X is a character variable.

PROC TABULATE DATA = READIN;

CLASS Y;

VAR X;

TABLE X*N*Y;

RUN;

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Ujjawal

08-01-2014 03:24 PM

Var variables in tabulate are expected to be numeric. Make X a class variable and it should work.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to ballardw

08-01-2014 03:32 PM

If i make X a class variable, it will return cross tab. This is what i am not looking for. I want count of X grouped by Y.

Refer the code : -

PROC SQL;

Create table readin1 as

Select Y, count(X) from

readin

group by Y;

quit;

Solution

08-01-2014
03:37 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Ujjawal

08-01-2014 03:37 PM

where not missing(x);

class y;

tables y;

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Ujjawal

08-01-2014 03:52 PM

proc tabulate data=readin out=want(where=(x = ' ') drop=_;

class x y;

table y,x all;

run;

proc print data=want;

run;