Sorry if this is rant-ish
Anyway, I am trying to do a simple summary statistics on my data set. However, all of the variables are of type char when I need some of them to be numeric. So I am trying to change their type but for the life of me cannot figure out how. Furthermore, I am also overthinking it to some extent as the variable names in the table contain white spaces and the values are of the form: $65,000.00 -- with that said, to me at least I think I need to remove the dollar sign, the comma, and the decimal point and its following digits.
In the example to change types, all the data is being created in the proc and not pulled in from a table. I tried something like the following:
data work.import1;
set work.import1;
But I do not know how to reference the variables in work.import1 because of the whitespace.
To turn character variables that look like numbers into actual numbers, you could use the INPUT function.
There are examples at that link.
In order to reference variables that don't follow the naming convention rules you can use a SAS name constant.
For example, I want to create a new column called New which has the same value as my other one called A Column:
New= 'A Coulmn'n;
Put the variable name in quotes and put the letter n after the ending quote.
thank you! As soon as I took a step back I found this was a common question. Although in what I found the way they have it listed is
'Name lastName'n = Name_LastName does SAS make a fuss if its this way I'm used to C so I think of it as your example, VARIABLE = EXPRESSION compared to EXPRESSION = VARIABLE. Or does it not matter in SAS?
@zsmith93 wrote:
thank you! As soon as I took a step back I found this was a common question. Although in what I found the way they have it listed is
'Name lastName'n = Name_LastName does SAS make a fuss if its this way I'm used to C so I think of it as your example, VARIABLE = EXPRESSION compared to EXPRESSION = VARIABLE. Or does it not matter in SAS?
A normal assignment statement in SAS has the target on the left and the expression on the right.
new_var = 'Silly Name for a variable'n ;
If you are trying to use a RENAME statement (or RENAME= option) then the old name is on the left and the new name on the right.
rename 'Silly Name for a variable'n = new_var ;
If you created the dataset using PROC IMPORT then try running that step again, only set the option VALIDVARNAME to V7. Normally then PROC IMPORT will convert your goofy names with spaces and other invalid characters into something that is easier to use.
options validvarname=v7;
proc import ....
If you have a variable named 'total cost'n that contains text like '$3,210.85' you can convert that to a new variable by using an INPUT() function call with the COMMA informat.
data want;
set have ;
total_cost = input('total cost'n,comma32.);
run;
Note that you need to make a NEW variable. You cannot change the type of an existing variable. You can use the RENAME statement to change the names if you liked the old name and want to keep using it for the new numeric variable.
What is your original data source?
Usually, the easiest and best solution is to read it in properly instead of fixing it after the fact.
@zsmith93 wrote:
Sorry if this is rant-ish
Anyway, I am trying to do a simple summary statistics on my data set. However, all of the variables are of type char when I need some of them to be numeric. So I am trying to change their type but for the life of me cannot figure out how. Furthermore, I am also overthinking it to some extent as the variable names in the table contain white spaces and the values are of the form: $65,000.00 -- with that said, to me at least I think I need to remove the dollar sign, the comma, and the decimal point and its following digits.
In the example to change types, all the data is being created in the proc and not pulled in from a table. I tried something like the following:
data work.import1;
set work.import1;
But I do not know how to reference the variables in work.import1 because of the whitespace.
I'm not sure if it's permitted so I hope I'm not breaking any rules. The files are from the following,
https://www.kaggle.com/wsj/college-salaries
I'm using all three data sources so I'd have to fix all three.
Those are just CSV files. So no need to "IMPORT" them. Just write a simple data step to read them.
So for the first one I just copied the first line of text and used that to create the variable names and labels.
data payback ;
infile 'degress-that-pay-back.csv' dsd truncover firstobs=2;
length Major $40 Starting Median Pchange P10 P25 P75 P90 8;
informat _numeric_ comma.;
input major -- p90;
label
Major = 'Undergraduate Major'
Starting ='Starting Median Salary'
Median = 'Mid-Career Median Salary'
Pchange = 'Percent change from Starting to Mid-Career Salary'
P10 = 'Mid-Career 10th Percentile Salary'
P25 = 'Mid-Career 25th Percentile Salary'
P75 = 'Mid-Career 75th Percentile Salary'
P90 = 'Mid-Career 90th Percentile Salary'
;
run;
❤️ aaaaaaaaaaaah, that's beautiful.
Okay, I just want to run by some questions to make sure I comprehend it. Had I used IMPORT it would just auto-gen a table and I would be left to fix it, whereas this preemptively addresses the issues, correct?
For the length command, what does the '8' do at the end of the line and is its scope applied to all the variables or just to P90?
For informat is this making the data numeric and removing the commas, how did you or SAS remove the dollar signs from the CSV?
Finally rename verses label. From what I was reading label affects the output but I'm not seeing how rename wouldn't be doing the same thing.
Sorry, it's late and my question could probably be answered if I poked around some more.
Read the documentation.
For the length command, what does the '8' do at the end of the line and is its scope applied to all the variables or just to P90?
The syntax is a series of pairs consisting of a variables list and a length. The number is the number of bytes used to store the variable. The $ indicates character variables, otherwise they are numeric. SAS stores all numbers as 8 byte floating point numbers. Hence the length for numeric should be 8. If you use a smaller number then SAS will remove the lower order bytes of the floating point number when writing it to the disk. That can work if the values are always integers, but it is normally NOT worth the effort.
For informat is this making the data numeric and removing the commas, how did you or SAS remove the dollar signs from the CSV?
An INFORMAT is instructions on how to convert text into values. The COMMA informat is for reading strings with $ and commas as numbers. Basically it just strips those two characters and then reads it as SAS normally reads numbers. So it also works fine with the PCHANGE variable that doesn't have either $ or comma.
Finally rename verses label. From what I was reading label affects the output but I'm not seeing how rename wouldn't be doing the same thing.
A LABEL is metadata attached to a variable, basically a description. Most reporting and analysis procedures (FREQ, GLM, etc.) will display the label on the output. Your source file really looks more like a report than an actual data transfer document.
The variable NAME is the actual unique id for the variable. It is what you need to type into your program. Nobody wants to type a sentence (and have to add extra quotes around it) in order to create a program.
More on variable lists. The little program I posted used variable lists in these statements:
length Major $40 Starting Median Pchange P10 P25 P75 P90 8;
informat _numeric_ comma.;
input major -- p90;
So there are 4 variable lists in there.
The LENGTH statement has two. The first one is just the single variable Major. The second is a space delimited list of names.
The INFORMAT statement uses the special keyword _NUMERIC_ to indicate all numeric variables that are defined (so far) in the data step. You can also use _CHARACTER_ and _ALL_ as variable lists.
The IINPUT statement uses a positional variable list. Basically the double hyphen means that SAS should take all of the variables (by their order in the dataset) from the first name to the last name.
You can also use a single hyphen to get a range of names that end in a numeric suffix. Example: VAR1 - VAR23
And you can use a colon suffix to get all variables that start with a common prefix. Example: P:
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.