Hi all,
I have a huge data base including (3000 columns, 500000 rows). Some of the columns are Numeric and some Character. For some reason I want to convert all numeric to characters in my data and retain the order and name of the columns. I know some stuff about how to convert columns one by one but I am looking for a program to do this in one shot. Any help would be appreciated.
Try just running two proc transpose steps.
data have ;
length a b $10 c d 8 e f $10 ;
input a--f ;
label c='Middle var';
cards;
x y 1 2 z 4
run;
proc transpose data=have out=middle ;
var _all_;
run;
proc transpose data=middle out=want (drop=_name_);
var col1;
run;
proc print; run;
This is a untested code, you may try using arrays as below
data want;
set have;
array nu(*) _numeric_;
array ch(*) $ character1-characterxx;
do i = 1 to dim(nu);
if nu(i) ne . then ch(i)=put(n(i),best.);
end;
run;
Thanks,
Jag
MAXISAS wrote:
For some reason I want to convert all numeric to characters in my data and retain the order and name of the columns.
Some reason?
Try just running two proc transpose steps.
data have ;
length a b $10 c d 8 e f $10 ;
input a--f ;
label c='Middle var';
cards;
x y 1 2 z 4
run;
proc transpose data=have out=middle ;
var _all_;
run;
proc transpose data=middle out=want (drop=_name_);
var col1;
run;
proc print; run;
Tom,
OP said he have a huge table . It is not possible for proc transpose under that scenario .
I think SQL is a better choice.
data have ; length a b $10 c d 8 e f $10 ; input a--f ; label c='Middle var'; cards; x y 1 2 z 4 ; run; data _null_; set sashelp.vcolumn(keep=libname memname name type where=(libname='WORK' and memname='HAVE')) end=last; if _n_ eq 1 then call execute('proc sql; create table want as select '); if type='char' then call execute(name); else call execute('put('||strip(name)||',best32. -l) as '||strip(name)); if not last then call execute(','); else call execute(' from have;quit;'); run;
Xia Keshan
You solutions ignores the underlying format associated with each variable you may need to address that deficiency in your program.
Converting with format is where PROC TRANSPOSE has an advantage although the amount of data and variables may be too much as you suggest. I reckon the OP needs to decide what is important and perhaps give more details about the conversion from numeric to character. Maybe for a "some reason" conversion best 32 is adequate we may never know.
I agree with DN and think that the number of variables limitation can be overcome by inserting a counter. e.g.:
data have ;
length a b $10 c d 8 e f $10 ;
input a--f ;
label c='Middle var';
cards;
x y 1 2 z 4
a b 3.3754 2.842 y 77.5
run;
data need;
set have;
counter+1;
run;
proc transpose data=need out=middle (where=(_name_ ne 'counter')) ;
var _all_;
by counter;
run;
data middle;
set middle;
col1=left(col1);
run;
proc transpose data=middle out=want (drop=counter _name_);
var col1;
by counter;
run;
Thank you all for your generous helps.
Hello again,
I am going to explain a little more about the data and why I want this conversion. As I said the database have almost 3000 columns and I dont know which one is numeric and which one is character (unless I go through them one by one). I was just hoping the code can figure it out itself and transform it.
The whole reason that I am doing this is that I have 20 databases all have the same 2988 variables (the name of variables among all 20 files are the same) and I want to append them together. The important point is that since there are some mistakes in the content of variables (for example variable 13), the type of variables are not the same. For example, variable 13 has numeric type in database 4 and has character type in database 17. when SAS wants to append them together, they put missing data if there is any mismatch. I just want to append all these 20 files without having any missing data even if the data are not matched (cause I can fixed them later). This was the reason I want to convert all my files to character and then append them to avoid loosing any data when I SAS append them together.
Thanks all you guys in advance for your helps,
Bests
If they are actually in text files instead of a database then just read them in as text to begin with.
For example if they are all in CSV files then you could even possibly read them all in one data step.
data want ;
infile '*.csv' dsd truncover lrecl=32000 ;
length var1-var3000 $200 ;
input var1-var3000 ;
run;
Thanks Tom,
Actually I have a big file which is csv. and your answer worked. Is it possible to import first line of csv file as variable names in this code?
Thanks for your help
Bests
Usually I just pull the first line into the editor and convert it to a LENGTH statement. For example if it is comma delimited I change the commas to spaces. Then program looks like.
data want ;
infile 'myfile.csv' dsd truncover lrecl=32000 ;
length firstvar nextvar ..... lastvar $200 ;
input firstvar -- lastvar ;
run;
If that is too hard you can read the first line with a program and write the names to the log or another file.
data _null_;
infile 'myfile.csv' obs=1 dsd truncover ;
file 'myfile.names' ;
length name $32 ;
do until (name=' ' );
input name @;
put name;
end;
run;
I know you have moved on from the subject line (converting numerics ) to reading from .csv, but just before trying to add value on INPUT consider that SAS makes it easy to convert numbers to strings.
data strings ;
set your.data;
file 'a strings file.txt' dsd lrecl=1000000;
Put (_all_)(:);
run ;
But you want to read......
Suppose you have file big.csv
you are happy to read all cols as string, but want to see their col names.
Have a go with
Data bigstrings(compress=yes);
Infile "big.csv" dsd lrecl=1000000 dlm=',' truncover ;
input (col1-col3000 )( :$200. ) ;
run;
You will see the column headings in thr first row.
That could be followed with a step to prepare "renaming" from that row....
data _null_ ;
Filename gencode "%sysfunc(pathname(work))/gencode.sas" ;
File gencode ;
set bigstrings( obs= 1 );
put 'proc datasets nolist lib=work; '
/ ' modify bigstrings ; rename ' ;
Array col(*) ;
Do c= 1 to dim(col) ;
if not nvalid( col(c), 'V7' ) then continue ;
Put col(c)= ;
End ;
Put ' ; run; quit; ' ;
Stop;
Run;
* now apply the renaming;
%include gencode ;
Hi Arthur.T ,
It is easy for SQL.
data have ; length a b $10 c d 8 e f $10 ; input a--f ; format d z4.; label c='Middle var'; cards; x y 1 2 z 4 ; run; data _null_; set sashelp.vcolumn(keep=libname memname name type label format where=(libname='WORK' and memname='HAVE')) end=last; if _n_ eq 1 then call execute('proc sql; create table want as select '); if type='char' then call execute(strip(name)||' label="'||strip(label)||'"'); else call execute('strip(put('||strip(name)||','||coalescec(strip(format),'best32.')||')) as '||strip(name)||' label="'||strip(label)||'"'); if not last then call execute(','); else call execute(' from have;quit;'); run;
Xia Keshan
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.
Find more tutorials on the SAS Users YouTube channel.