DATA Step, Macro, Functions and more

Frequencies, Labels for all Variables into One Table

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 8
Accepted Solution

Frequencies, Labels for all Variables into One Table

[ Edited ]

I work in an archive, and we spend a lot of time reviewing one-way frequencies for each variable in our datasets to check for consistency as well as potential confidentiality issues. I would like to streamline this a little bit by creating a table with the one-way frequencies for all variables, including columns for the variable name, the variable label, the raw value, the formatted/labeled value, and the frequency count for the value in question.

 

I am fairly new to SAS, but it seems to me that getting these tables the way that I want will require using PROC SQL, as the ODS output options for PROC FREQ create unique coluns for each value label. I can't think of a way in SQL to aggregate counts for the values of all variables at the same time, so I'm assuming that I'll need to use macros to pull a list of all the variables and then loop them into proc sql statements that will feed into the table.

 

So far, this is what I have for the frequencies table:

 

 

*Pulling all variable names from the dataset _DATASET_ into a list;
PROC SQL;
select name into :varlist separated by ' '
from dictionary.coulmns
where memname = '_DATASET_';
quit;

*Creating a table to store the query results (and dropping it if it already exists in memory);
proc sql;
drop table work.freqtab
create table work.freqtab
     (Name_Col char(75),
      Val_Name NUM(5),
      Count NUM(5));
quit;

*Attempting to query each variable for its values, aggregate counts, and a dummy column that just displays the variable's name;
*I still don't know how to return both the value code as well as the formatted value label, nor do I know how to retrieve the variable label;
*The results viewer shows labels/formats, but the actual results in the table viewer show the numeric value;
%macro sqlloop(data,vars); 
proc sql;
select %unquote(%str(%')%vars.%str(%')) format=$CHAR75. as Name_Col, &vars. as Val_Name, count(&vars.) as count
from &data.
group by Val_Name, Name_Col;
quit;
%mend;

%sqlloop(data=_dataset_,vars=varlist) 
*Trying to cycle the varlist into the macro doesn't work; I assume I need to loop this differently but I'm not sure how.

 

So, this leaves me with a few issues

1. I don't know how to feed the variables in varlist into the macro one by one. What am I missing?

2. I don't know how to pull formats (value and variable descriptions) into my tables. I'd be fine doing so through a join later on or directly in this query.

3. I need this to work for frequencies of Date and String values as well. Ideally, I could recode all values into string/char format for storing in the table, but I'm not sure how to do this without getting an error that a numeric operation in a query needs a numeric field in the table.

4. I'm not sure how efficient this is; it's possible there's another command or way to do this that is less resource intensive.

 

I'm on Sas 9.4. 

 


Accepted Solutions
Solution
‎03-12-2017 06:21 PM
Super User
Posts: 10,549

Re: Frequencies, Labels for all Variables into One Table

See if this gives you a good start. I made a couple of formats to apply to the variables in SASHELP.Class as well as variable labels and then created a dataset from SASHELP.CLASS.

The final want data set may have more than you want but I think it has the main pieces.

 

proc format library=work;
   value $sexclass
   F='Female'
   M='Male'
   ;
   value hrange
   0-<60='Under 5 ft'
   60-high='5 or more ft';
run;
data work.class;
   set sashelp.class;
   label
      name='Child name'
      sex ='Child gender'
      age ='Age in Years'
      height='Height (in)'
      weight='Weight (lbs)'
   ;
   format sex $sexclass. height hrange.;
run;


proc tabulate data=work.class out=classtab ;
   class name sex age height weight;
   table name sex age height weight,
         n
   ;
run;
options missing=' ';
data temp;
   set classtab (keep =  name sex age height weight n _type_);
   array names {5} $32 _temporary_  ("name" "sex" "age" "height" "weight");
   vname = names[index(_type_,'1')];
   val = catt(name, sex, age, height, weight);
   label vname='Variable name'
         val  ='Variable label'
         n    ='Frequency'
   ;
;
run;

proc sql;
   create table combined as
   select a.vname, a.val, a.n,b.label label='Variable Label',b.format 
   from temp as a left join 
      (select * from dictionary.columns 
         where LIBNAME='WORK' and Memname='CLASS') as b
    on upcase(a.vname)=upcase(b.name)
    ;
quit;

data want;
   set combined  ;
   if not missing(format) then do;
      if format =: "$" then formattedvalue = putc(val,format);
      else formattedvalue= putn(input(val,best32.),format);
   end;
   else formattedvalue=val;
run; 

Note that the OPTION MISSING =' '; is very important to building the value as CATT is an obnoxious way to combine numeric and character results (the values of the variables) but will place the missing character for missing values if it is anything except a blank.

 

View solution in original post


All Replies
Super User
Posts: 10,549

Re: Frequencies, Labels for all Variables into One Table

I am having a hard time visualizing what you want the output to look like. Could you provide an example for one of the SAS supplied data sets like SASHELP.Class? That doesn't have very many variables or rows of data so you may be able to do that by hand fairly quickly.

Occasional Contributor
Posts: 8

Re: Frequencies, Labels for all Variables into One Table

I won't do the full thing because it'd take up too much space, but pretend that the below is representative of every unique value for every variable in the SASHELP.Class dataset:

Name_ColVal_NameCount
NameAlfred1
NameAlice1
NameBarbara1
NameCarol1
NameHenry1
NameJames1
NameMary1
NameWilliam1
SexF9
SexM10
Age112
Age125
Height572
Height633
Weight1122
Weight1122

I'm not sure whether the Class dataset uses formats, but for variables like Sex we would usually have them coded as 1 or 2 with the labels "Male" and "Female", and I'd also like to capture that when applicable so that there would be an extra column displaying the value labels for formatted numeric items.
Solution
‎03-12-2017 06:21 PM
Super User
Posts: 10,549

Re: Frequencies, Labels for all Variables into One Table

See if this gives you a good start. I made a couple of formats to apply to the variables in SASHELP.Class as well as variable labels and then created a dataset from SASHELP.CLASS.

The final want data set may have more than you want but I think it has the main pieces.

 

proc format library=work;
   value $sexclass
   F='Female'
   M='Male'
   ;
   value hrange
   0-<60='Under 5 ft'
   60-high='5 or more ft';
run;
data work.class;
   set sashelp.class;
   label
      name='Child name'
      sex ='Child gender'
      age ='Age in Years'
      height='Height (in)'
      weight='Weight (lbs)'
   ;
   format sex $sexclass. height hrange.;
run;


proc tabulate data=work.class out=classtab ;
   class name sex age height weight;
   table name sex age height weight,
         n
   ;
run;
options missing=' ';
data temp;
   set classtab (keep =  name sex age height weight n _type_);
   array names {5} $32 _temporary_  ("name" "sex" "age" "height" "weight");
   vname = names[index(_type_,'1')];
   val = catt(name, sex, age, height, weight);
   label vname='Variable name'
         val  ='Variable label'
         n    ='Frequency'
   ;
;
run;

proc sql;
   create table combined as
   select a.vname, a.val, a.n,b.label label='Variable Label',b.format 
   from temp as a left join 
      (select * from dictionary.columns 
         where LIBNAME='WORK' and Memname='CLASS') as b
    on upcase(a.vname)=upcase(b.name)
    ;
quit;

data want;
   set combined  ;
   if not missing(format) then do;
      if format =: "$" then formattedvalue = putc(val,format);
      else formattedvalue= putn(input(val,best32.),format);
   end;
   else formattedvalue=val;
run; 

Note that the OPTION MISSING =' '; is very important to building the value as CATT is an obnoxious way to combine numeric and character results (the values of the variables) but will place the missing character for missing values if it is anything except a blank.

 

Occasional Contributor
Posts: 8

Re: Frequencies, Labels for all Variables into One Table

Thanks! I very quickly ran this with a copy of the class dataset in my work library, and the output is exactly what I'm looking for (although I'm getting some errors with the formats that I will take a closer look at in a bit).

 

However, I noticed that you seem to have the variables manually listed in the code, where as I was hoping to use a varlist or a macro since most of the datasets I work with have 800 to 1,200 variables each. Can use the varlist in my originally posted syntax, or some other trick, to automatically populate the sections of this code that require a list of variable names?

Super User
Posts: 10,549

Re: Frequencies, Labels for all Variables into One Table

Always get a process working for one dataset and variable list first before involving macro language or programming

Which is usually the hard part.

 

The list of variables from a data set can be pulled into a macro variable from the dictionary.columns table

proc sql noprint;
   select name into : varlist separated by ' '
   from dictionary.columns
   where Libname='SASHELP' and Memname='CLASS';
quit;

%put &varlist;

 

Building the quoted list such as "name" "sex" "age" "height" "weight

select quote(name) into:

Experimentation should take very little time to build two lists with one proc sql call.

 

And an exercise for the intereststed reader is to use Dictionary.Members to go through the members of a library.

Occasional Contributor
Posts: 8

Re: Frequencies, Labels for all Variables into One Table

Thanks, ballardw! I found some time to play around with concatenation commands and the into subcommand today, and got this working perfectly. I really appreciate the help!

 

I'll have to spend some time this week reading more about the various options, commands, and use of arrays within the data step, since that seems to be where you really brought things together.

Super User
Posts: 10,549

Re: Frequencies, Labels for all Variables into One Table

Arrays are a very powerful tool. Almost anytime you start thinking "I need to do the same thing to multiple variables" then array is likely going to be part of the solution.

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 7 replies
  • 312 views
  • 3 likes
  • 2 in conversation