BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
sovrappensiero
Fluorite | Level 6

Hi!

 

I have a dataset with hundreds (on the order of 400-500) of indicator variables (values 1,2, or 3) that I imported from Excel (.xlsx) and they have imported as character values.  I need them to be numeric.  If relevant, the data were transposed after importing (initial data look like mixed data type - see below for an example).  They are not all clustered together, so I cannot use something like array cha(*) var1 -- var450 ...  

 

Is there any way to convert hundreds of "truly numeric" variables that are masquerading as character variables, and are scattered among legitimate character variables, in any way that avoids me having to list all of them or convert them one-by-one?  

 

Original data imported:

             City1      City2      City3     City4      City5

Var1     words      words    words     words      words

Var2       1              2           1             2             2

Var3       2              2           1             1             1

Var4     words      words    words     words      words

Var5     words      words    words     words      words

 

 

After transposing:

            Var1      Var2      Var3     Var4      Var5               

City1     words     1           2     words      words                            

City2     words     2           2     words      words                                   

City3     words     1           1     words      words                                   

City4     words     2           1     words      words              

City5     words     2           1     words      words              

 

 

I'd love some suggestions on how to fix this!  I know how to do this easily in R, perl, bash, and other programming languages, but at the moment I am constrained to using SAS.

1 ACCEPTED SOLUTION

Accepted Solutions
art297
Opal | Level 21

I think that the following will do what you want:

 

data have;
  input (City Var1 x y z Var5) ($);               
  cards;
City1     words     1           2     words      words                            
City2     2     2           2     words      words                                   
City3     words     1           .     words      words                                   
City4     words     2           1     words      words              
City5     words     2           1     words      words              
;

proc format; 
   value $allnum 
         other=[anyalpha()];
run;

ods output onewayfreqs=work.owf;
proc freq data=have;
  format var1--var5 $allnum.;
  tables var1--var5/ missing;
run;
ods listing;

data owf;
  length var $32 fvalue $50 ;
  set owf;
  by table notsorted;
  if first.table and last.table;
  var=scan(table,-1);
  fvalue=vvaluex('F_'||var);
  if fvalue eq 0;
  keep var;
run;

proc sql noprint;
  select catt(var,'=_',var),
         catt(var,'=input(_',var,',8.);')
    into :rens separated by ' ',
         :vars separated by ' '
      from owf
  ;
quit;

data want (drop=_:);
  set have (rename=(&rens.));
  &vars.;
run;

Art, CEO, AnalystFinder.com

 

View solution in original post

22 REPLIES 22
Reeza
Super User

Can you use this notation which would select all character variables between first and last variable?

 

array to_convert(*) first_var -character- last_var;

@sovrappensiero wrote:

Hi!

 

I have a dataset with hundreds (on the order of 400-500) of indicator variables (values 1,2, or 3) that I imported from Excel (.xlsx) and they have imported as character values.  I need them to be numeric.  If relevant, the data were transposed after importing (initial data look like mixed data type - see below for an example).  They are not all clustered together, so I cannot use something like array cha(*) var1 -- var450 ...  

 

Is there any way to convert hundreds of "truly numeric" variables that are masquerading as character variables, and are scattered among legitimate character variables, in any way that avoids me having to list all of them or convert them one-by-one?  

 

Original data imported:

             City1      City2      City3     City4      City5

Var1     words      words    words     words      words

Var2       1              2           1             2             2

Var3       2              2           1             1             1

Var4     words      words    words     words      words

Var5     words      words    words     words      words

 

 

After transposing:

            Var1      Var2      Var3     Var4      Var5               

City1     words     1           2     words      words                            

City2     words     2           2     words      words                                   

City3     words     1           1     words      words                                   

City4     words     2           1     words      words              

City5     words     2           1     words      words              

 

 

I'd love some suggestions on how to fix this!  I know how to do this easily in R, perl, bash, and other programming languages, but at the moment I am constrained to using SAS.


 

ballardw
Super User

@sovrappensiero wrote:

Hi!

 

I have a dataset with hundreds (on the order of 400-500) of indicator variables (values 1,2, or 3) that I imported from Excel (.xlsx) and they have imported as character values.  I need them to be numeric.  If relevant, the data were transposed after importing (initial data look like mixed data type - see below for an example).  They are not all clustered together, so I cannot use something like array cha(*) var1 -- var450 ...  


Easiest may be to go back to the IMPORT stage.

First save the Excel file to a CSV file. (Reason: there are options available for delimited files not available for Excel).

Import the CSV.

If using Proc Import code then add the option GUESSINGROWS=MAX. If using a wizard find the options and set the "number of rows to guess" to a very large number, ideally the number of rows in the source file. Also indicate which row of data contains the first actual data row. A common cause of problems when there is one or more blank header rows. With Excel you can't tell SAS to ignore them. With a CSV file you can indicate the first row of data with the wizard or the DATAROW statement in Proc Import code.

 

Be default the import procedure guesses as to variable type using a very few rows of data. If the full range of values are not encountered in the first rows then the procedure guesses wrong.

sovrappensiero
Fluorite | Level 6
Hi Reeza! I can’t do this because ALL the variables of of character type.
Reeza
Super User

Ok, so if you know how to do this in R, how would you identify which variables in R would need to be converted?

 

 

sovrappensiero
Fluorite | Level 6
Thank you I will try to import it this way!
Reeza
Super User

Unfortunately, given the data structure - in rows rather than columns I don't think simply converting the file to CSV would help. 

Except if you take the current data, export it to CSV and then read it back in as numeric. But you'd have to specify the character types anyways and if you're going to do that anyways, you may as well clean it up in SAS. 

 

I would be happy to be wrong though 🙂

 


@sovrappensiero wrote:
Thank you I will try to import it this way!

 

sovrappensiero
Fluorite | Level 6
Good question. Firstly, I think they would automatically import as numeric in R. If not, I could re-arrange the columns so that all the numeric ones are together(a real pain, but I searched for how to do this in SAS and I couldn’t find a way although by that time I was a little tired and hungry, LOL). Then I could use an apply function to convert all the variables from column 1 to whatever using the as.numeric() function. I ran into a problem with the array in SAS for two reasons:

1. When SAS cannot concert a legitimate character variable to numeric, instead of ignoring it and throwing a warning it converts it to missing. So I cannot use _ALL_ and apply a conversion function hoping that it would “smart convert” only things that look like numbers go numeric format.

2. Because my variables are scattered, there is no easy way to rename them. SAS requires you that, for example newvar = input(oldvar, best8.). In R you can change variables on the fly.
Reeza
Super User

@sovrappensiero wrote:
Good question. Firstly, I think they would automatically import as numeric in R. If not, I could re-arrange the columns so that all the numeric ones are together(a real pain, but I searched for how to do this in SAS and I couldn’t find a way although by that time I was a little tired and hungry, LOL). Then I could use an apply function to convert all the variables from column 1 to whatever using the as.numeric() function. I ran into a problem with the array in SAS for two reasons:



Assuming the first data set is what you had in Excel, I don't think it would in R if your data is in mixed types as shown. Even R forces types on columns not along rows. 

 


@sovrappensiero wrote:



1. When SAS cannot concert a legitimate character variable to numeric, instead of ignoring it and throwing a warning it converts it to missing. So I cannot use _ALL_ and apply a conversion function hoping that it would “smart convert” only things that look like numbers go numeric format.



You can test this using ANYDIGIT or ANYALPHA and then applying the conversion conditionally.

 


@sovrappensiero wrote:



2. Because my variables are scattered, there is no easy way to rename them. SAS requires you that, for example newvar = input(oldvar, best8.). In R you can change variables on the fly.

Unfortunately, this is definitely an issue that SAS needs to resolve. One way is to add a prefix to all automatically using the SASHELP.VCOLUMN table. Then you can keep track of which ones you need to convert and rename after the fact or delete all variables that are all missing. This is definitely not an easy, beginner level process though.

 

If you're doing this one time, I would probably copy and transpose the data in Excel and then re-import into SAS using GUESSINGROWS=MAX as suggested earlier. 

 

If this needs to be repeated, that's obviously not a good solution!

 

sovrappensiero
Fluorite | Level 6
I’m sorry for the typos! I was using my phone and typing faster than the phone can input letters. 😊
ballardw
Super User

@sovrappensiero wrote:

Hi!

 

I have a dataset with hundreds (on the order of 400-500) of indicator variables (values 1,2, or 3) that I imported from Excel (.xlsx) and they have imported as character values.  I need them to be numeric.  If relevant, the data were transposed after importing (initial data look like mixed data type - see below for an example).  They are not all clustered together, so I cannot use something like array cha(*) var1 -- var450 ...  

 

Is there any way to convert hundreds of "truly numeric" variables that are masquerading as character variables, and are scattered among legitimate character variables, in any way that avoids me having to list all of them or convert them one-by-one?  

 

Original data imported:

             City1      City2      City3     City4      City5

Var1     words      words    words     words      words

Var2       1              2           1             2             2

Var3       2              2           1             1             1

Var4     words      words    words     words      words

Var5     words      words    words     words      words

 

 

After transposing:

            Var1      Var2      Var3     Var4      Var5               

City1     words     1           2     words      words                            

City2     words     2           2     words      words                                   

City3     words     1           1     words      words                                   

City4     words     2           1     words      words              

City5     words     2           1     words      words              

 

 

I'd love some suggestions on how to fix this!  I know how to do this easily in R, perl, bash, and other programming languages, but at the moment I am constrained to using SAS.


Since the data structure wasn't included the previous comments I made won't work.

However, have you ever used the EXCEL Copy>Paste Special>Transpose feature? Highlight all the data, copy, go to a new sheet or work book and select the Paste Special and Transpose. Save the result as CSV and import.

One hopes that you do not have lots of files of this type.

s_lassen
Meteorite | Level 14

Something like this may work (using the test data supplied by @Ksharp😞

data test;
input (varr City1      City2      City3     City4      City5) ($);
cards;
Var1     words      words    words     words      words
Var2       1              2           1             .             2
Var3       2              2           1             1             1
Var4     words      words    words     words      words
Var5     words      words    words     words      words
;run;

First, transpose:

proc transpose data=test out=trans;
  var city:;
  id  varr;
run;

Then create a temporary program file to convert the numeric variables:

filename tempsas temp;
data _null_;
  set test;
  file tempsas;
  array test(*) City:;
  do _N_=1 to dim(test);
    if input(test(_N_),?? best32.)=. and test(_N_) not in('.',' ') then
      delete;
    end;
  put
   '_' varr '=input(' varr ',best32.);' /
   'drop ' varr ';' /
   'rename _' varr '=' varr ';'
   ;
run;

Then use that to convert the numeric variables:

data want;
  set trans;
  %include tempsas/source2;
run;
Tom
Super User Tom
Super User

One possibility is to just dump your current (converted) dataset to a delimited file and then let PROC IMPORT figure out which variables are numeric.

filename csv temp;
proc export data=have outfile=csv dbms=csv ; run;
proc import datafile=csv out=want dbms=csv; run;

Otherwise If your existing conversion process was a classic double transpose then you could try analyzing the middle (tall skinny) version of the data to find out the names of the variables whose set of values only include '1','2' and '3' and use that list in your char to num conversion step.

art297
Opal | Level 21

I think that the following will do what you want:

 

data have;
  input (City Var1 x y z Var5) ($);               
  cards;
City1     words     1           2     words      words                            
City2     2     2           2     words      words                                   
City3     words     1           .     words      words                                   
City4     words     2           1     words      words              
City5     words     2           1     words      words              
;

proc format; 
   value $allnum 
         other=[anyalpha()];
run;

ods output onewayfreqs=work.owf;
proc freq data=have;
  format var1--var5 $allnum.;
  tables var1--var5/ missing;
run;
ods listing;

data owf;
  length var $32 fvalue $50 ;
  set owf;
  by table notsorted;
  if first.table and last.table;
  var=scan(table,-1);
  fvalue=vvaluex('F_'||var);
  if fvalue eq 0;
  keep var;
run;

proc sql noprint;
  select catt(var,'=_',var),
         catt(var,'=input(_',var,',8.);')
    into :rens separated by ' ',
         :vars separated by ' '
      from owf
  ;
quit;

data want (drop=_:);
  set have (rename=(&rens.));
  &vars.;
run;

Art, CEO, AnalystFinder.com

 

sovrappensiero
Fluorite | Level 6

@art297, I am working through your solution right now.  It's so elegant and as I figure out what the SAS procedures are I keep thinking, "Brilliant!"  Thank you so much for this insight.  The one part that I do not understand is the SQL part...I can't figure out what it is doing.  I took an intro SQL class so I know that it's selecting a subset from work.owf and putting them into something...but I'm a little lost!

 

Any explanation you can provide would surely help!  Thanks again.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 22 replies
  • 3929 views
  • 5 likes
  • 8 in conversation