Hi:
Just curious, what is the book you have available???
In a SAS dataset, you might not see white=1 black=2 and hispanic=3. You would be more likely to see either values of white, black and hispanic or 1, 2 and 3.
If your data uses a FORMAT library or has permanent formats assigned, then you might see the type of equivalence that you describe if you printed out the contents of the format library. A user-defined format is the way that you can tell SAS how variable values should be displayed. So, instead of seeing 1, 2 and 3 when you display the data, you would see the format labels of White, Black or Hispanic.
Here is a chunk 'o code. It uses SASHELP.CLASS -- a sample dataset that should be available with every SAS installation. If you are using SAS Enterprise Guide, there are point and click methods to allow you to perform these tasks. If you are in an environment where you have to submit SAS code using SAS Display Manager or via a batch program, then these statements should work for you "out of the box".
Because I wanted to preserve the program code indenting and spacing, I used the "pre-formatting" tags for posting. This means that you will need to cut and paste the code from the forum window into Microsoft Word (or generally any word processor that respects carriage returns/line feeds) and then cut and paste again from the word processor into SAS before you submit the code.
I put some comments in the code (comments are the lines that start with * (asterisk) and end with ; (semi-colon)).
Section #1 shows the basic PROC PRINT, PROC CONTENTS, PROC FREQ and PROC UNIVARIATE outputs using SASHELP.CLASS. Section #2 shows how to create some user-defined formats for the purpose of counting your variables in other groups based on your formatted values. (So, for example, Age 11-14 will get a value of either Non-Driver or 0, depending on what format you use. But neither format is permanently assigned to the data -- so you could use whichever format suited your purposes. Age will still be the actual age in the dataset.)
Section #3 shows how to make new variables if you really want to make new variables in a copy of the original dataset. I would never add new variables to SASHELP.CLASS --- because I believe it is better to preserve the integrity of the original data (and besides I used to work for lawyers and always had to prove to them that I did not tamper with the original data and variables-- only used it to make a new copy of the data with new variables per their instructions.)
Since the code is so long, I will not post results here. You should be able to run the code and review the results step by step, output by output -- since every step has a unique title. By examining the documentation for each procedure step and examining the documentation for how DATA step programs work, you should be able to figure out most of what is going on in this program. I've also put some other beginner/tutorial references below.
These web sites/papers have some good comparisons of STATA, SAS and/or SPSS code:
http://www.ats.ucla.edu/stat/SAS/
http://www.aaegrad.uga.edu/stata_sas_guide.pdf
And these are useful papers:
http://www.nesug.org/proceedings/nesug07/ff/ff07.pdf
http://www.nesug.org/proceedings/nesug08/ff/ff06.pdf
http://www.nesug.org/Proceedings/nesug09/sa/sa07.pdf
http://www.nesug.org/proceedings/nesug08/ff/ff12.pdf
http://www.nesug.org/proceedings/nesug05/pm/pm6.pdf
http://www2.sas.com/proceedings/sugi31/246-31.pdf
cynthia
[pre]
** 1) SASHELP.CLASS is a sample dataset with 19 observations -- each observation represents;
** a student. Variables in the dataset are: NAME SEX AGE HEIGHT WEIGHT;
** PROC PRINT will display all the observations and all the variables in the dataset;
** the OBS column shows the current sorted or original order of the observations.;
proc print data=sashelp.class;
title 'List the Observations';
run;
** PROC CONTENTS will display Variable NAME and other variable and dataset-level information.;
proc contents data=sashelp.class;
title 'What is the Variable and Dataset Information?';
run;
** PROC FREQ will perform COUNTS and give you FREQUENCY, PERCENT, CUM FREQUENCY and CUM PERCENT;
** by default. The NLEVELS options shows how many "levels" there are for each variables values.;
proc freq data=sashelp.class nlevels;
title 'Perform Frequency Counts on Character Variables and AGE';
tables name sex age;
run;
** PROC UNIVARIATE will reveal basic descriptive statistics for numeric variables, it will show;
** percentiles and extreme observations (5 highest and 5 lowest values) for each numeric variable.;
proc univariate data=sashelp.class;
title 'Get Basic Statistics for Numeric Variables';
var age height weight;
run;
** 2) Make a User-defined format for age;
proc format fmtlib;
value agef 11-14 = 'Non-Driver'
15-16 = 'Student Driver'
other = 'Unknown';
value drvf 11-14 = '0'
15-16 = '1'
other = '.';
run;
** Use the new format with PROC FREQ and the original dataset;
proc freq data=sashelp.class nlevels;
title 'With a User-Defined Format for Age making category labels';
tables age;
format age agef.;
run;
proc freq data=sashelp.class nlevels;
title 'With a User-Defined Format for Age setting only 0 or 1';
tables age;
format age drvf.;
run;
** With a user-defined format, you may not need to create new variables. But, if you;
** do decide you need to create or recode new variables, then see #3.;
** 3) Create a new version of the original dataset with new variables.;
** The new variables are FEMALE, MALE and DRIVER.;
** I use a simple IF statement to create new variables. If SEX = 'F', then I assign a;
** value of 1 to the new variable FEMALE. Of course, if SEX='F' then the value for the;
** MALE variable must be 0 -- and vice versa for the creation of the MALE variable.;
** The new variable DRIVER is based on AGE. If AGE is LE 14 then the value for Driver is 0.;
** If they are 15 or older, it is assumed that they are of driving age and so, ;
** the new variable DRIVER is assigned a value of 1.;
** Also shown is how to make new variables from a user defined format, such as DRVF;
** which creates a 0 or 1 value for the new variable based on the student age.;
data work.makenewvars;
set sashelp.class;
if sex = 'F' then do;
FEMALE = 1;
MALE=0;
end;
else if sex = 'M' then do;
MALE = 1;
FEMALE = 0;
end;
if age le 14 then Driver = 0;
else if age ge 15 then Driver = 1;
** make AltDriver variable with format;
** using the FORMAT to do a look-up instead of an IF statement;
AltDriver = input(put(age,drvf.),1.0);
run;
** Proc CONTENTS on new dataset;
proc contents data=work.makenewvars;
title 'PROC CONTENTS Showing New Variables';
run;
** Perform a PROC PRINT on the new dataset with the new variables;
proc print data=work.makenewvars;
title 'List the observations with new variables';
var name sex female male age driver AltDriver height weight;
run;
** Use the new dataset and new variables with PROC FREQ.;
proc freq data=work.makenewvars nlevels;
title 'Frequency and Percentages with New Variables';
tables sex female male driver altdriver;
run;
[/pre]