- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi all,
I am not sure if this question has been answered elsewhere (I've been trying to find it but haven't seen it anywhere) so bear with me. I have a multiple large datasets (some >1,000 observations and >30 variables) and I’ve been trying to convert my variables from character to numeric but it just isn’t working.
The first way that I tried was using a basic numeric operation. For example:
data y;
set x;
var1 = '.';
var1 = var*1;
run;
LOG:
NOTE: Character values have been converted to numeric values at the places given by:
(Line):(Column).
9:13
NOTE: Numeric values have been converted to character values at the places given by:
(Line):(Column).
9:21
NOTE: There were 600 observations read from the data set x.
NOTE: The data set y has 600 observations and 33 variables.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.00 seconds
When I look at the proc contents, var1 is character just like var. It looks like it gets converted twice (to numeric and then back again) on the log?
I tried using the INPUT function as well:
data y;
set x;
input var $1. subject_id;
var1 = input(var,8.);
datalines;
0 10101448
0 10101762
0 10102382
0 10103278
1 10103611
1 10104153
0 10104473
0 10104555
run;
LOG:
NOTE: There were 9 observations read from the data set x.
NOTE: The data set y has 8 observations and 33 variables.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
It worked this way for some variables and not for others (it worked this time when I ran it), but then when I looked at my data I realized it had deleted the observations for some of my variables altogether (specifically for subject_id). On a related note, it did that when I tried to use RENAME at a different time as well.
I guess my questions are these: does anyone know another way to get my variables to be numeric? Do you know why it deleted some of my observations?
I'm absolutely certain that there's an error in my code somewhere but it's been escaping me.
Thanks!
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
In your first example VAR1 is defined as character because you used it as character in the first point where you referenced.For the same reason is defined as length $1 since in the first reference SAS can tell that you want VAR1 to be length 1 since you are setting to a constant of that length.
Eliminate this line.
var1 = '.';
Your other code it better, but will have a problem if the new variable name already exists in the old (X) dataset, since SAS will have already determined the type and length from its definition in that data set.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Errm, this worked fine for me (removed the set x as did not know where that came from), added infile datalines just to be sure):
data y;
infile datalines dlm=" ";
input var $1. subject_id ;
var1 =input(var,8.);
datalines;
0 10101448
0 10101762
0 10102382
0 10103278
1 10103611
1 10104153
0 10104473
0 10104555
;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
A couple of items. First, note that variables cannot be switched from character to numeric. You can work around that by creating a new variable, such as:
data want;
set have;
newvar = input(var1, 8.);
drop var1;
rename newvar=var1;
run;
That will keep the numeric variable, using the old name (VAR1).
Regarding the missing values based on SUBJECT_ID, most likely there are some values that don't constitute a legitimate number. They might contain letters, embedded blanks ... something that just doesn't convert to a number.
Good luck.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
data y;
set x;
input var $1. subject_id;
var1 = input(var,8.);
datalines;
0 10101448
0 10101762
0 10102382
0 10103278
1 10103611
1 10104153
0 10104473
0 10104555
run;
Based on the above syntax dataset y will have observations from the samller dataset which is coming from datalines. Values under common variables will be replaced with the last dataset.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
In your first example VAR1 is defined as character because you used it as character in the first point where you referenced.For the same reason is defined as length $1 since in the first reference SAS can tell that you want VAR1 to be length 1 since you are setting to a constant of that length.
Eliminate this line.
var1 = '.';
Your other code it better, but will have a problem if the new variable name already exists in the old (X) dataset, since SAS will have already determined the type and length from its definition in that data set.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks everyone! That did it for me. I appreciate your replies!