Hello,
I have an excel file I like to read into SAS. The dataset contains many variables that are both numeric character values, so I like to read the variables in the dataset as character. I saw some post that can enable read only one variable as character. But the dataset contains many variables that it will be challenging to list them all, so I want to read all variables as character.
eg:
data xx;
input a b c d;
datalines;
2 <llq 4 6
<llq 4 5 3
4 6 <llq 4
5 6 3 1
;
run;
Any help will be appreciated:
Unfortunately Excel is a poor data medium (i.e. it is not well structured like a dataset or database) and I suppose you are using proc import - which is a guessing procedure. There is no simple switch to allow you read all in as character as far as I am aware. One easy solution is to add one row to you Excel file with "AAAAA" in each cell of each column (Select row, shift+right fill). This way proc import will always "guess" the data to be character.
Another alternative is to save to CSV file, then proc import that, in the log it will give you the code which it has used to import the file. You can then modify that to match the formats and lengths you want - this is the best method for getting data in as it is accurate, repeatable and fully under you (as the data owners) control.
A final method I can think of would be to write a VBA macro in Excel which writes out the Excel file to CSV, with all the data enclosed in double quotes.
But I suppose the real question here is why you want them all character, that data looks like lab results to me, therefore it would be far easier to work with in a normalised structure (E.g. CDISC LAB, or LB domain format).
There is nothing stopping you doing imputation or anything with a normalised data structure. Let me show with the data you provided;
data xx; input char_result $ @@; numeric_result=ifn(char_result="<llq",999,input(char_result,best.)); datalines; 2 <llq 4 6 <llq 4 5 3 4 6 <llq 4 5 6 3 1 ; run;
In this way I retain the original data, and the converted numeric variant. This is how CDISC models work by having both.
Its the same to read datalines of text file, which csv is:
data xx; infile "....csv"; input a $ b $ c $ d$ e$; array c{5} a b c d e; array n{5} a_n b_n c_n d_n e_n; do i=1 to 5; n{i}=ifn(c{i}="<llq",999,input(c{i},best.)); end; run;
But you see from that how much more effort it is to work with transposed data.
Standard answer: save to csv, import with data step. The only way to guarantee consistent repeatable results.
I think it depends *how* you import the Excel file. Some engines (such as PCFILES server) support the DBSASTYPE option which allows you to force that data be read from Excel as character.
I submitted a SASware ballot idea to add such an option for the XLSX engine:
Related thread:
Save $250 on SAS Innovate and get a free advance copy of the new SAS For Dummies book! Use the code "SASforDummies" to register. Don't miss out, May 6-9, in Orlando, Florida.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.