BookmarkSubscribeRSS Feed
bncoxuk
Obsidian | Level 7
I just got a very large SAS data. Most of the variables are categorical. I need to get a summary of the different categories for each of the categorical variables. How can I do this? I tried PROC Contents but it does not show all the categories.
13 REPLIES 13
Ksharp
Super User
What categories do you mean?
If you can find something in proc contents. Maybe dictionary table dictionary.columns woule contains some information you need.


Ksharp
bncoxuk
Obsidian | Level 7
Suppose I have a variable called 'brands'. This variable has 8 categories (e.g. Kraft, Cadbury). Another variable called 'products'. This variable has 3 categories (e.g. Food). In the dataset, there are over 50 variables, each of which has a set of categories.

I want to get a summary of the categories that each of the variables have.
ballardw
Super User
The quick simple view would be PROC FREQ.
If you leave out a table statement you'll get output for all variables in the data.
Use the NLEVELS option as
Proc freq data= nlevels;run;

This generates a table of variable names, labels and how many levels (categories) a variable may have. Large numbers would indicate things that aren't likely to be "categories" in the way you are thinking.
bncoxuk
Obsidian | Level 7
ballardw,

I tried, but this only gave me the number of levels each variable has. In fact, what i want to find out is the names of all the levels, not just 'how many levels'. Thank you.
ballardw
Super User
> ballardw,
>
> I tried, but this only gave me the number of levels
> each variable has. In fact, what i want to find out
> is the names of all the levels, not just 'how many
> levels'. Thank you.

Use the list of varialbles that look likely to be categories in Proc Freq.

You could just use Proc Freq data=yourdata;run;
BUT if you have identification variables (which are categorical), phone numbers and such as well as any continuous or pseudocontinuous (income for examp) but don't want to see every one of those you need a way to find the likely ones of interest. That's what the levels statement does in this case, give a starting point.

I'd be tempted to dump the levels output to a dataset and filter on the range of levels to get a list of variable for use elsewhere.
Ksharp
Super User
Yes.
ballarwd is right.Try it.the dataset want_dataset contains the information you want.


[pre]
ods output nlevels=want_dataset;
proc freq data=sashelp.class nlevels;
tables _all_;
run;
[/pre]


Ksharp
bncoxuk
Obsidian | Level 7
Ksharp, I am looking for the names of all the levels, not just 'how many levels'.

Thanks for help.
Ksharp
Super User
Oh.
You need this.
[pre]
ods output onewayfreqs=want;
proc freq data=sashelp.class ;
tables _all_ ;
run;

[/pre]

Ksharp
bncoxuk
Obsidian | Level 7
Ksharp, I tried again, but the code you gave to me only products a cummulative frequency etc. for a continuous (numeric) variable. The results do not show anything about the categorical variables.

To put the question simple, I want to get a summary table which shows the names of the levels for all the categorical variables. For example, the variable Gender has 3 levels (M, F, O), and the variable AgeGroup has 5 groups (1, 2, 3, 4, 5). I want to get a table to show the levels for all such categorical variables.

The question seems very easy, but in practice very difficult to get an easy solution.
SPR
Quartz | Level 8 SPR
Quartz | Level 8
Hello Bncohuk,

This is my solution for SASHELP.CLASS:
[pre]
proc SQL;
select COUNT(distinct Name) as n into :n
from sashelp.vcolumn
where libname="SASHELP" and memname="CLASS";
%let n=%trim(&n);
select distinct name as name into :n1-:n&n
from sashelp.vcolumn
where libname="SASHELP" and memname="CLASS";
;quit;
%macro a;
%do i=1 %to &n;
proc SQL;
create table _t as
select distinct &&n&i
from SASHELP.CLASS
;quit;
%if &i = 1 %then %do; data r; set _t; run; %end;
%else %do; data r; merge r _t; run; %end;
%end;
%mend a;
%a;
[/pre]
Sincerely,
SPR
bncoxuk
Obsidian | Level 7
SPR, you approach gave me all the variable names in the data set. So I used the one by KSharp which worked.

Thanks!
SPR
Quartz | Level 8 SPR
Quartz | Level 8
Hello Bncoxuk,

Dataset r in my program contains the information you requested and it is the same as in KSHARP's output but does not contain counts.

Sincerely,
SPR
Ksharp
Super User
Yes.I gave you an answer.But you need to process this dataset.
OK. let me to giva you a solution.try this:

[pre]
ods output onewayfreqs=want(drop=frequency);
proc freq data=sashelp.class ;
tables _all_ /nopercent nocum nofreq ;
run;
data want;
set want;
variable_name=scan(table,2);
variable_value=scan(catx(' ',of name -- weight),1);
keep variable_:;
run;
[/pre]

Opps.NOTE: name is your first variable and weight is your last variable in dataset.
Ksharp Message was edited by: Ksharp

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 13 replies
  • 13491 views
  • 0 likes
  • 4 in conversation