BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
cridgew13
Calcite | Level 5

Hi all,

I am not sure if this question has been answered elsewhere (I've been trying to find it but haven't seen it anywhere) so bear with me. I have a multiple large datasets (some >1,000 observations and >30 variables) and I’ve been trying to convert my variables from character to numeric but it just isn’t working.

The first way that I tried was using a basic numeric operation. For example:

data y;

set x;

var1 = '.';

var1 = var*1;

run;


LOG:

NOTE: Character values have been converted to numeric values at the places given by:

      (Line):(Column).

      9:13

NOTE: Numeric values have been converted to character values at the places given by:

      (Line):(Column).

      9:21

NOTE: There were 600 observations read from the data set x.

NOTE: The data set y has 600 observations and 33 variables.

NOTE: DATA statement used (Total process time):

      real time           0.01 seconds

      cpu time            0.00 seconds

When I look at the proc contents, var1 is character just like var. It looks like it gets converted twice (to numeric and then back again) on the log?

I tried using the INPUT function as well:

data y;

set x;

input var $1. subject_id;

var1 = input(var,8.);

datalines;

0 10101448

0 10101762

0 10102382

0 10103278

1 10103611

1 10104153

0 10104473

0 10104555

run;


LOG:


NOTE: There were 9 observations read from the data set x.

NOTE: The data set y has 8 observations and 33 variables.

NOTE: DATA statement used (Total process time):

      real time           0.00 seconds

      cpu time            0.00 seconds

It worked this way for some variables and not for others (it worked this time when I ran it), but then when I looked at my data I realized it had deleted the observations for some of my variables altogether (specifically for subject_id). On a related note, it did that when I tried to use RENAME at a different time as well.

I guess my questions are these: does anyone know another way to get my variables to be numeric? Do you know why it deleted some of my observations?


I'm absolutely certain that there's an error in my code somewhere but it's been escaping me.


Thanks!

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

In your first example VAR1 is defined as character because you used it as character in the first point where you referenced.For the same reason is defined as length $1 since in the first reference SAS can tell that you want VAR1 to be length 1 since you are setting to a constant of that length.

Eliminate this line.

var1 = '.';

Your other code it better, but will have a problem if the new variable name already exists in the old (X) dataset, since SAS will have already determined the type and length from its definition in that data set.

View solution in original post

5 REPLIES 5
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Errm, this worked fine for me (removed the set x as did not know where that came from), added infile datalines just to be sure):

data y;

infile datalines dlm=" ";

input var $1. subject_id ;

var1 =input(var,8.);

datalines;

0 10101448

0 10101762

0 10102382

0 10103278

1 10103611

1 10104153

0 10104473

0 10104555

;

run;

Astounding
PROC Star

A couple of items.  First, note that variables cannot be switched from character to numeric.  You can work around that by creating a new variable, such as:

data want;

set have;

newvar = input(var1, 8.);

drop var1;

rename newvar=var1;

run;

That will keep the numeric variable, using the old name (VAR1).

Regarding the missing values based on SUBJECT_ID, most likely there are some values that don't constitute a legitimate number.  They might contain letters, embedded blanks ... something that just doesn't convert to a number.

Good luck.

stat_sas
Ammonite | Level 13

data y;

set x;

input var $1. subject_id;

var1 = input(var,8.);

datalines;

0 10101448

0 10101762

0 10102382

0 10103278

1 10103611

1 10104153

0 10104473

0 10104555

run;

Based on the above syntax dataset y will have observations from the samller dataset which is coming from datalines. Values under common variables will be replaced with the last dataset.

Tom
Super User Tom
Super User

In your first example VAR1 is defined as character because you used it as character in the first point where you referenced.For the same reason is defined as length $1 since in the first reference SAS can tell that you want VAR1 to be length 1 since you are setting to a constant of that length.

Eliminate this line.

var1 = '.';

Your other code it better, but will have a problem if the new variable name already exists in the old (X) dataset, since SAS will have already determined the type and length from its definition in that data set.

cridgew13
Calcite | Level 5

Thanks everyone! That did it for me. I appreciate your replies!

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 6583 views
  • 6 likes
  • 5 in conversation