Help using Base SAS procedures

How to control the type of variable with a Proc Import?

Reply
Occasional Contributor
Posts: 8

How to control the type of variable with a Proc Import?

Hello everybody,

 

 

I am struggling with some imports of Excel files on SAS/UE. I would like to import two files that have a common variable and merge them. The problem is that one is being imported with the variable in character and the other one in numeric which makes the table merging impossible. I tried several techniques after reading posts online: "guessingrows" option, "mixed=YES" option, or even something like:

 

 

%let varlist = Code 8 Townname $15;
data myfile;
    length &varlist;
    set myfile;
run;

 

 The code I have been using is quite simple:

    proc import datafile="/folders/myshortcuts/myfile.xlsx"
        out=myfile
        dbms=xlsx
        replace;
    run;

I really don't know what to do at this point. Thanks a lot.

 

Best regards,

 

Stephanie

Super User
Super User
Posts: 9,599

Re: How to control the type of variable with a Proc Import?

Posted in reply to stephaniektaf

Excel is a bad format - no structure. 

Proc report is a guessing procedure, it guesses what your data is supposed to be.

Add the two together and you will always have issues.  

Answer, use a simple data transfer file format - CSV, XML - etc. and write a specific data import program to read the data in accurately as you know the data should be.

 

So first step, in Excel save as and save the file as a comma separated variable file.  This is a text file with data separated by commas.

Now, run your proc import on this csv file - in the log you will see a datastep showing you what the import procedure has guessed for the data.  You can copy that code out to your code window, and then modify it to read in the data correctly.  E.g. it will look something like:

data want;
  infile "...you_csv_file.csv" dlm=",";
  length abc 8
         def  $12;
  format abc 8
         def $12;
  informat abc 8
           def $12;
  input abc 
        def $;
run;

So this specifies the file to read in, the lengths of each variable, the format to display it as, the format to read the data, and then the read statement.  In this way you fully control every aspect of the import.

 

 

Occasional Contributor
Posts: 8

Re: How to control the type of variable with a Proc Import?

Hello @RW9,

 

Thank you for your reply. I have done what you asked me to do by saving the Excel file as a CSV file (they are automatically separated by a semicolon). I put the same code than you for each length, format, informat and input. As the variables are all characters, I added "$8." (with a full stop at the end because SAS doesn't recognize the format if written "$8"). But it gives me a table separated every 8 characters and not every time it meets a semicolon.

 

What do you think is the problem?

 

Thanks a lot,

 

Stephanie

Super User
Posts: 10,278

Re: How to control the type of variable with a Proc Import?

Posted in reply to stephaniektaf

When you already have an informat statement, then you need not use any informats in the input statement. Even the $ for character variables is not necessary if you assigned a length or an informat.

Take care to have a dlm=';' in the infile statement.

For further help, post your code and a few example lines if the infile, using the proper buttons ({i} and "little running man").

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
How to post code
Occasional Contributor
Posts: 8

Re: How to control the type of variable with a Proc Import?

Posted in reply to KurtBremser

Hi @KurtBremser,

 

Thanks for your reply. It actually worked. However, the csv file is unable to take into account foreign language compared to the proc import. I changed the delimiter as ";" in the infile statement as mentionned.

 

data Observation;
  infile "/folders/myshortcuts/Observations/file.csv" dlm=";";
  length Code $8. Townname $20. State_Code $20. State_Name $20. State_M_Name $20. Township_Code $20. Township_Name $20. Township_M_Name $20.;
  format Code $8. Townname $20. State_Code $20. State_Name $20. State_M_Name $20. Township_Code $20. Township_Name $20. Township_M_Name $20.;
  informat Code $8. Townname $20. State_Code $20. State_Name $20. State_M_Name $20. Township_Code $20. Township_Name $20. Township_M_Name $20.;
  input Code Townname State_Code State_Name State_M_Name Township_Code Township_Name Township_M_Name;
run;

Thanks a lot,

 

Best regards,

Stephanie

Super User
Super User
Posts: 9,599

Re: How to control the type of variable with a Proc Import?

Posted in reply to stephaniektaf
Super User
Super User
Posts: 9,599

Re: How to control the type of variable with a Proc Import?

Posted in reply to stephaniektaf

What is posted was an example unrelated to your data. 

Please refer to the statement:

"Now, run your proc import on this csv file - in the log you will see a datastep showing you what the import procedure has guessed for the data.  You can copy that code out to your code window, and then modify it to read in the data correctly.  E.g. it will look something like:"

 

As I cannot see your data, nor program or anything else, I cannot write anything specifically for you, only suggest some general things.

Also note that a semicolon delimited file is not a CSV file.  You will need to change the delimiter to:

infile "...you_csv_file.txt" dlm=";";

And I would encourage you  not to name delimited files as CSV, hence I changed my example extension to be .txt.

Occasional Contributor
Posts: 8

Re: How to control the type of variable with a Proc Import?

Hi @RW9,

 

Yes that was an example. I wrote the code:

 

data Observation;
  infile "/folders/myshortcuts/Observations/file.csv" dlm=";";
  length Code $8. Townname $20. State_Code $20. State_Name $20. State_M_Name $20. Township_Code $20. Township_Name $20. Township_M_Name $20.;
  format Code $8. Townname $20. State_Code $20. State_Name $20. State_M_Name $20. Township_Code $20. Township_Name $20. Township_M_Name $20.;
  informat Code $8. Townname $20. State_Code $20. State_Name $20. State_M_Name $20. Township_Code $20. Township_Name $20. Township_M_Name $20.;
  input Code Townname State_Code State_Name State_M_Name Township_Code Township_Name Township_M_Name;
run;

And it worked. However, I did not understand the technique cited on running a proc import on the csv file. I think it is because it uses the delimiter "," and not ";". I have a csv file with semicolon as delimiter maybe because the laptop is french. The log gives me:

 

 86             data WORK._OBSERVATION2    ;
 87             %let _EFIERR_ = 0; /* set the ERROR detection macro variable */
 88             infile '/folders/myshortcuts/Observations/file.csv' delimiter = ',' MISSOVER DSD lrecl=32767 firstobs=2 ;
 89                informat Code_Townname_State_Code_State $76. ;
 90                format Code_Townname_State_Code_State $76. ;
 91             input
 92                         Code_Townname_State_Code_State  $
 93             ;
 94             if _ERROR_ then call symputx('_EFIERR_',1);  /* set ERROR detection macro variable */
 95             run;

Also, it is the wrong informat, format and input. That is maybe why I have a weird file with the proc import.

 

Thanks a lot,

 

Stephanie

Super User
Posts: 10,278

Re: How to control the type of variable with a Proc Import?

Posted in reply to stephaniektaf

From the variable names I guess that there's no really confidential information in that file, so it would really help a lot if you copy/pasted the first 10 lines or so into a separate file and attached that to your next post here, so we have something to play around with.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
How to post code
Super User
Super User
Posts: 8,120

Re: How to control the type of variable with a Proc Import?

Posted in reply to stephaniektaf

Looks like you told PROC IMPORT that the delimiter was a comma instead of a semi-colon.  You can change the options on the PROC IMPORT step.

proc import datafile="/folders/myshortcuts/Observations/file.csv"
  data=Observation replace
  dbms=csv 
; 
  delimiter=';';
run;
Super User
Super User
Posts: 8,120

Re: How to control the type of variable with a Proc Import?

You normally want to add the DSD option to the INFILE statement when reading delimited files.  This will treat adjacent delimiters as meaning there is an empty field.  Otherwise multiple adjacent delimiters are treated as one delimiter.

Occasional Contributor
Posts: 8

Re: How to control the type of variable with a Proc Import?

Ok I just added it. Thanks a lot.

 

data Observation;
  infile "/folders/myshortcuts/Observations/file.csv" dlm=";" dsd;
  length Code $8. Townname $20. State_Code $20. State_Name $20. State_M_Name $20. Township_Code $20. Township_Name $20. Township_M_Name $20.;
  format Code $8. Townname $20. State_Code $20. State_Name $20. State_M_Name $20. Township_Code $20. Township_Name $20. Township_M_Name $20.;
  informat Code $8. Townname $20. State_Code $20. State_Name $20. State_M_Name $20. Township_Code $20. Township_Name $20. Township_M_Name $20.;
  input Code Townname State_Code State_Name State_M_Name Township_Code Township_Name Township_M_Name;
run;

But it seemed to work already properly without the dsd.

 

Best regards,

 

Stephanie

Super User
Super User
Posts: 8,120

Re: How to control the type of variable with a Proc Import?

[ Edited ]
Posted in reply to stephaniektaf

@stephaniektaf wrote:

Ok I just added it. Thanks a lot.

 

data Observation;
  infile "/folders/myshortcuts/Observations/file.csv" dlm=";" dsd;
  length Code $8. Townname $20. State_Code $20. State_Name $20. State_M_Name $20. Township_Code $20. Township_Name $20. Township_M_Name $20.;
  format Code $8. Townname $20. State_Code $20. State_Name $20. State_M_Name $20. Township_Code $20. Township_Name $20. Township_M_Name $20.;
  informat Code $8. Townname $20. State_Code $20. State_Name $20. State_M_Name $20. Township_Code $20. Township_Name $20. Township_M_Name $20.;
  input Code Townname State_Code State_Name State_M_Name Township_Code Township_Name Township_M_Name;
run;

But it seemed to work already properly without the dsd.

 

Best regards,

 

Stephanie


DSD option will allow it to handle missing values properly.  You probably did not have any missing values.  

You should also add TRUNCOVER option to prevent SAS from going to the next row if any of the lines have too few values.

 

Why do you have periods after the numbers in the LENGTH statement?  Those are not formats, they are lengths.

Why are you attaching $xx formats and informats?  SAS already knows how to read and write character variables, so there is no need to attach special formats or informats for them.  The same is true for most numbers.  Although you might want to attach formats to floating point numbers to specify how many decimal places it should print by default.

Ask a Question
Discussion stats
  • 12 replies
  • 147 views
  • 0 likes
  • 4 in conversation