I'm pretty new to SAS and I'm just learning how to work my way around the program. I need to check the codebook for my variables so that I can change a categorical variable to dichotomous. How do I look at the codebook for a single variable?
I'm pretty familiar with stata, and I know you would just type "codebook variablename" is it this simple in SAS?
If I understand you correctly, you want to be able to see more than just the variable names??? Do you already have a SAS dataset? Let's assume that your SAS dataset has a variable called MARITALSTATUS -- what would you want to see:
1) the variable type (character or numeric), the variable length in bytes, the variable label, etc -- information ABOUT the variable
2) would you want to see the variable VALUES??
where the number shown is the count of observations with Divorced, Married, Single or Widowed values for the MaritalStatus variable??
Thank you so much for responding. I'm sure all those suggestions are perfect. I have a SAS book, and it seems like the whole time it says "SAS can do this" but doesn't tell me where to look.
I'll clarify what I'm asking by posing a example. Say I wanted to recode something like race into a dichotomous variable where white=1 non white=0. The original race measure may be coded something like white=1 black=2 hispanic=3. I need to see the original codes so that I can know how to recode a new variable.
Thank you so much for even reading my post! I'm so grateful to any of those that help, or even will just discuss this with me. I come from the Social Sciences so we aren't very good with SAS or even numbers in general!
Just curious, what is the book you have available???
In a SAS dataset, you might not see white=1 black=2 and hispanic=3. You would be more likely to see either values of white, black and hispanic or 1, 2 and 3.
If your data uses a FORMAT library or has permanent formats assigned, then you might see the type of equivalence that you describe if you printed out the contents of the format library. A user-defined format is the way that you can tell SAS how variable values should be displayed. So, instead of seeing 1, 2 and 3 when you display the data, you would see the format labels of White, Black or Hispanic.
Here is a chunk 'o code. It uses SASHELP.CLASS -- a sample dataset that should be available with every SAS installation. If you are using SAS Enterprise Guide, there are point and click methods to allow you to perform these tasks. If you are in an environment where you have to submit SAS code using SAS Display Manager or via a batch program, then these statements should work for you "out of the box".
Because I wanted to preserve the program code indenting and spacing, I used the "pre-formatting" tags for posting. This means that you will need to cut and paste the code from the forum window into Microsoft Word (or generally any word processor that respects carriage returns/line feeds) and then cut and paste again from the word processor into SAS before you submit the code.
I put some comments in the code (comments are the lines that start with * (asterisk) and end with ; (semi-colon)).
Section #1 shows the basic PROC PRINT, PROC CONTENTS, PROC FREQ and PROC UNIVARIATE outputs using SASHELP.CLASS. Section #2 shows how to create some user-defined formats for the purpose of counting your variables in other groups based on your formatted values. (So, for example, Age 11-14 will get a value of either Non-Driver or 0, depending on what format you use. But neither format is permanently assigned to the data -- so you could use whichever format suited your purposes. Age will still be the actual age in the dataset.)
Section #3 shows how to make new variables if you really want to make new variables in a copy of the original dataset. I would never add new variables to SASHELP.CLASS --- because I believe it is better to preserve the integrity of the original data (and besides I used to work for lawyers and always had to prove to them that I did not tamper with the original data and variables-- only used it to make a new copy of the data with new variables per their instructions.)
Since the code is so long, I will not post results here. You should be able to run the code and review the results step by step, output by output -- since every step has a unique title. By examining the documentation for each procedure step and examining the documentation for how DATA step programs work, you should be able to figure out most of what is going on in this program. I've also put some other beginner/tutorial references below.
** 1) SASHELP.CLASS is a sample dataset with 19 observations -- each observation represents;
** a student. Variables in the dataset are: NAME SEX AGE HEIGHT WEIGHT;
** PROC PRINT will display all the observations and all the variables in the dataset;
** the OBS column shows the current sorted or original order of the observations.;
proc print data=sashelp.class;
title 'List the Observations';
** PROC CONTENTS will display Variable NAME and other variable and dataset-level information.;
proc contents data=sashelp.class;
title 'What is the Variable and Dataset Information?';
** PROC FREQ will perform COUNTS and give you FREQUENCY, PERCENT, CUM FREQUENCY and CUM PERCENT;
** by default. The NLEVELS options shows how many "levels" there are for each variables values.;
proc freq data=sashelp.class nlevels;
title 'Perform Frequency Counts on Character Variables and AGE';
tables name sex age;
** PROC UNIVARIATE will reveal basic descriptive statistics for numeric variables, it will show;
** percentiles and extreme observations (5 highest and 5 lowest values) for each numeric variable.;
proc univariate data=sashelp.class;
title 'Get Basic Statistics for Numeric Variables';
var age height weight;
** 2) Make a User-defined format for age;
proc format fmtlib;
value agef 11-14 = 'Non-Driver'
15-16 = 'Student Driver'
other = 'Unknown';
value drvf 11-14 = '0'
15-16 = '1'
other = '.';
** Use the new format with PROC FREQ and the original dataset;
proc freq data=sashelp.class nlevels;
title 'With a User-Defined Format for Age making category labels';
format age agef.;
proc freq data=sashelp.class nlevels;
title 'With a User-Defined Format for Age setting only 0 or 1';
format age drvf.;
** With a user-defined format, you may not need to create new variables. But, if you;
** do decide you need to create or recode new variables, then see #3.;
** 3) Create a new version of the original dataset with new variables.;
** The new variables are FEMALE, MALE and DRIVER.;
** I use a simple IF statement to create new variables. If SEX = 'F', then I assign a;
** value of 1 to the new variable FEMALE. Of course, if SEX='F' then the value for the;
** MALE variable must be 0 -- and vice versa for the creation of the MALE variable.;
** The new variable DRIVER is based on AGE. If AGE is LE 14 then the value for Driver is 0.;
** If they are 15 or older, it is assumed that they are of driving age and so, ;
** the new variable DRIVER is assigned a value of 1.;
** Also shown is how to make new variables from a user defined format, such as DRVF;
** which creates a 0 or 1 value for the new variable based on the student age.;
if sex = 'F' then do;
FEMALE = 1;
else if sex = 'M' then do;
MALE = 1;
FEMALE = 0;
if age le 14 then Driver = 0;
else if age ge 15 then Driver = 1;
** make AltDriver variable with format;
** using the FORMAT to do a look-up instead of an IF statement;
AltDriver = input(put(age,drvf.),1.0);
** Proc CONTENTS on new dataset;
proc contents data=work.makenewvars;
title 'PROC CONTENTS Showing New Variables';
** Perform a PROC PRINT on the new dataset with the new variables;
proc print data=work.makenewvars;
title 'List the observations with new variables';
var name sex female male age driver AltDriver height weight;
** Use the new dataset and new variables with PROC FREQ.;
proc freq data=work.makenewvars nlevels;
title 'Frequency and Percentages with New Variables';
tables sex female male driver altdriver;
As I wrote in another post, per today I do not have any SAS at hand, so everything I mention that resembles code is actually untested.
Back to the topic: could it be that the OP has a format, lets say RACEFMT. that assigns e.g. 'white' to 1, 'black' to 2, 'hispanic' to 3? In this case dichotomous variables could be created by statements like
dich_white = ( put(race_variable,racefmt.)='white' ) ;
dich_black = ( put(race_variable,racefmt.)='black' ) ;
dich_hisp = ( put(race_variable,racefmt.)='hispanic' ) ;
@RandomHero2239: a PROC CONTENTS will tell you what (if any) format is assigned to your variable. To see that format's definition do a
proc format fmtlib ;
select format_name ; /* replace format_name by the format name */