Desktop productivity for business analysts and programmers

list of "eject" line after proc import

Accepted Solution Solved
Reply
Contributor
Posts: 68
Accepted Solution

list of "eject" line after proc import

Hello

 

I have tested a proc import under SAs with a CSV file.

 

A numeric column contains several errors. For exemple a column of integer like "123456" , but there is unauthorized caracterse like "123-456" or "(123456)" .

 

proc import DATAfile="\\serveur\dossier\fichier.csv" DBMS=csv OUTFILE=work.import REPLACE; 
GETNAMES=no; 
guessingrows=32767; 
DELIMITER=";"; 
run;

I compare the situation with Access. With Acces when we import data and there is a error , a table is created with the meam key of the line , to inform about the anomaly.

 

With SAS it's possible to found a list with all line which are not accepted by system ? If it's possible with the log , because by moment it's not possible to see all line , only a part ...

 

Thanks for your help .


Accepted Solutions
Solution
‎04-04-2018 04:23 AM
Super User
Posts: 13,926

Re: list of "eject" line after proc import

Posted in reply to azertyuiop

Here is a brief example of sending data to a "baddata" set when encountering errors in reading the data:

data gooddata (keep=number) baddata (keep=str line);
   length str $ 50.;
   input number;
   if _error_ then do;
      str=_infile_;
      line=_n_;
      output baddata;
   end;
   else output gooddata;
datalines;
1
234
45-789
(123456)
1E27
;
run;

If SAS has any read error then it sets an internal variable named _error_ to 1(or true) so you can test if a read error occurred if you test immediately after an input statement.

 

SAS also maintains another variable, _infile_ that has the contents of an entire line of data being read. The special variable _n_ is the iteration of the data step and can be used in many cases as a "line" counter.

So 1) create a string variable long enough to hold the longest line of data 2) test if there was a read error, 3) if an error send to the bad data set, of not the good set. Note that OUTPUT with a data set name is used to when data goes to which data set, the KEEP (or a DROP data set option) indicates which variables are in which data set.

 

Note that I included 1E27 as a valid numeric in scientific notation just for fun.

 

If you source data contains things that would normally be invalid for a specified data value such as NULL or N/A for a numeric or 99999999 for a date but you don't want them to get flagged with this process then create custom informats that assign an appropriate value (likely MISSING)

View solution in original post


All Replies
Super User
Super User
Posts: 9,829

Re: list of "eject" line after proc import

Posted in reply to azertyuiop

Not without some effort.  I believe it has been mentioned many times before, proc import is a guessing procedure.  It does a quick scan of the file and tried to guess what that data is.  It is not a good production environment process as each run can be different.  If you are receiving data, then there should be an import agreement, or at least a data definition with the data otherwise the data and hence the process is worthless.  Write a datastep program which imports the data per the spec, e.g. if the spec states that field is numeric and you get 123-456, then the spec is incorrect - and it is not your task to correct others mistakes.  

You could of course have a quality check if you so choose (of course at your expense and time).  Read the variables all as large character fields - character can read in anything.  Then run some code on this which checks the data, i.e. you could do:

if lengthn(compress(var,"+-","d")) > 0 then flag=1;

I.e. if the text contains anything not convertible to numeric then flag (note you can do this many ways!).

But why, draw up agreements or get specs in place up front, this clarifies, simplifies and documents the transfer of data.

Super User
Posts: 23,998

Re: list of "eject" line after proc import

Posted in reply to azertyuiop

Check the code in the log against your Access type/column list. You can see where it's guessing wrong and correct it there.

 


@azertyuiop wrote:

Hello

 

I have tested a proc import under SAs with a CSV file.

 

A numeric column contains several errors. For exemple a column of integer like "123456" , but there is unauthorized caracterse like "123-456" or "(123456)" .

 

proc import DATAfile="\\serveur\dossier\fichier.csv" DBMS=csv OUTFILE=work.import REPLACE; 
GETNAMES=no; 
guessingrows=32767; 
DELIMITER=";"; 
run;

I compare the situation with Access. With Acces when we import data and there is a error , a table is created with the meam key of the line , to inform about the anomaly.

 

With SAS it's possible to found a list with all line which are not accepted by system ? If it's possible with the log , because by moment it's not possible to see all line , only a part ...

 

Thanks for your help .


 

Solution
‎04-04-2018 04:23 AM
Super User
Posts: 13,926

Re: list of "eject" line after proc import

Posted in reply to azertyuiop

Here is a brief example of sending data to a "baddata" set when encountering errors in reading the data:

data gooddata (keep=number) baddata (keep=str line);
   length str $ 50.;
   input number;
   if _error_ then do;
      str=_infile_;
      line=_n_;
      output baddata;
   end;
   else output gooddata;
datalines;
1
234
45-789
(123456)
1E27
;
run;

If SAS has any read error then it sets an internal variable named _error_ to 1(or true) so you can test if a read error occurred if you test immediately after an input statement.

 

SAS also maintains another variable, _infile_ that has the contents of an entire line of data being read. The special variable _n_ is the iteration of the data step and can be used in many cases as a "line" counter.

So 1) create a string variable long enough to hold the longest line of data 2) test if there was a read error, 3) if an error send to the bad data set, of not the good set. Note that OUTPUT with a data set name is used to when data goes to which data set, the KEEP (or a DROP data set option) indicates which variables are in which data set.

 

Note that I included 1E27 as a valid numeric in scientific notation just for fun.

 

If you source data contains things that would normally be invalid for a specified data value such as NULL or N/A for a numeric or 99999999 for a date but you don't want them to get flagged with this process then create custom informats that assign an appropriate value (likely MISSING)

Contributor
Posts: 68

Re: list of "eject" line after proc import

Hello , Good morning ,

 

I didn't think that it's possible to build a system with 2 tables. I did think that it's necessary to find a table like the table "vcolumn" which own all information about all table under current SAS session.

 

I will adapt the system that you propose with a proc import.

 

Thanks for your solution.

 

 

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 4 replies
  • 179 views
  • 0 likes
  • 4 in conversation