BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
azertyuiop
Quartz | Level 8

Hello

 

I have tested a proc import under SAs with a CSV file.

 

A numeric column contains several errors. For exemple a column of integer like "123456" , but there is unauthorized caracterse like "123-456" or "(123456)" .

 

proc import DATAfile="\\serveur\dossier\fichier.csv" DBMS=csv OUTFILE=work.import REPLACE; 
GETNAMES=no; 
guessingrows=32767; 
DELIMITER=";"; 
run;

I compare the situation with Access. With Acces when we import data and there is a error , a table is created with the meam key of the line , to inform about the anomaly.

 

With SAS it's possible to found a list with all line which are not accepted by system ? If it's possible with the log , because by moment it's not possible to see all line , only a part ...

 

Thanks for your help .

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

Here is a brief example of sending data to a "baddata" set when encountering errors in reading the data:

data gooddata (keep=number) baddata (keep=str line);
   length str $ 50.;
   input number;
   if _error_ then do;
      str=_infile_;
      line=_n_;
      output baddata;
   end;
   else output gooddata;
datalines;
1
234
45-789
(123456)
1E27
;
run;

If SAS has any read error then it sets an internal variable named _error_ to 1(or true) so you can test if a read error occurred if you test immediately after an input statement.

 

SAS also maintains another variable, _infile_ that has the contents of an entire line of data being read. The special variable _n_ is the iteration of the data step and can be used in many cases as a "line" counter.

So 1) create a string variable long enough to hold the longest line of data 2) test if there was a read error, 3) if an error send to the bad data set, of not the good set. Note that OUTPUT with a data set name is used to when data goes to which data set, the KEEP (or a DROP data set option) indicates which variables are in which data set.

 

Note that I included 1E27 as a valid numeric in scientific notation just for fun.

 

If you source data contains things that would normally be invalid for a specified data value such as NULL or N/A for a numeric or 99999999 for a date but you don't want them to get flagged with this process then create custom informats that assign an appropriate value (likely MISSING)

View solution in original post

4 REPLIES 4
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Not without some effort.  I believe it has been mentioned many times before, proc import is a guessing procedure.  It does a quick scan of the file and tried to guess what that data is.  It is not a good production environment process as each run can be different.  If you are receiving data, then there should be an import agreement, or at least a data definition with the data otherwise the data and hence the process is worthless.  Write a datastep program which imports the data per the spec, e.g. if the spec states that field is numeric and you get 123-456, then the spec is incorrect - and it is not your task to correct others mistakes.  

You could of course have a quality check if you so choose (of course at your expense and time).  Read the variables all as large character fields - character can read in anything.  Then run some code on this which checks the data, i.e. you could do:

if lengthn(compress(var,"+-","d")) > 0 then flag=1;

I.e. if the text contains anything not convertible to numeric then flag (note you can do this many ways!).

But why, draw up agreements or get specs in place up front, this clarifies, simplifies and documents the transfer of data.

Reeza
Super User

Check the code in the log against your Access type/column list. You can see where it's guessing wrong and correct it there.

 


@azertyuiop wrote:

Hello

 

I have tested a proc import under SAs with a CSV file.

 

A numeric column contains several errors. For exemple a column of integer like "123456" , but there is unauthorized caracterse like "123-456" or "(123456)" .

 

proc import DATAfile="\\serveur\dossier\fichier.csv" DBMS=csv OUTFILE=work.import REPLACE; 
GETNAMES=no; 
guessingrows=32767; 
DELIMITER=";"; 
run;

I compare the situation with Access. With Acces when we import data and there is a error , a table is created with the meam key of the line , to inform about the anomaly.

 

With SAS it's possible to found a list with all line which are not accepted by system ? If it's possible with the log , because by moment it's not possible to see all line , only a part ...

 

Thanks for your help .


 

ballardw
Super User

Here is a brief example of sending data to a "baddata" set when encountering errors in reading the data:

data gooddata (keep=number) baddata (keep=str line);
   length str $ 50.;
   input number;
   if _error_ then do;
      str=_infile_;
      line=_n_;
      output baddata;
   end;
   else output gooddata;
datalines;
1
234
45-789
(123456)
1E27
;
run;

If SAS has any read error then it sets an internal variable named _error_ to 1(or true) so you can test if a read error occurred if you test immediately after an input statement.

 

SAS also maintains another variable, _infile_ that has the contents of an entire line of data being read. The special variable _n_ is the iteration of the data step and can be used in many cases as a "line" counter.

So 1) create a string variable long enough to hold the longest line of data 2) test if there was a read error, 3) if an error send to the bad data set, of not the good set. Note that OUTPUT with a data set name is used to when data goes to which data set, the KEEP (or a DROP data set option) indicates which variables are in which data set.

 

Note that I included 1E27 as a valid numeric in scientific notation just for fun.

 

If you source data contains things that would normally be invalid for a specified data value such as NULL or N/A for a numeric or 99999999 for a date but you don't want them to get flagged with this process then create custom informats that assign an appropriate value (likely MISSING)

azertyuiop
Quartz | Level 8

Hello , Good morning ,

 

I didn't think that it's possible to build a system with 2 tables. I did think that it's necessary to find a table like the table "vcolumn" which own all information about all table under current SAS session.

 

I will adapt the system that you propose with a proc import.

 

Thanks for your solution.

 

 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

SAS Enterprise Guide vs. SAS Studio

What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 876 views
  • 0 likes
  • 4 in conversation