DATA Step, Macro, Functions and more

Importing a .csv file with a blank column

Reply
Frequent Contributor
Posts: 91

Importing a .csv file with a blank column

[ Edited ]

Here's what I'm trying to do:

 

 

DATA FOO;
  INFILE '/path/to/csv/file' DELIMITER = ',' MISSOVER DSD LRECL=32767 FIRSTOBS=2;
  INFORMAT ID 10.;
  INFORMAT BAR :$64.;
  INPUT ID BAR $;
RUN;

PROC SQL;
  CREATE TABLE TEST AS 
  SELECT *
  FROM FOO
  WHERE BAR IS NULL;
QUIT;

Still getting rows with blank BAR columns.

 

 

Checking to see what's in those columns I do this:

 

 

PROC SQL;
  CREATE TABLE TEST2 AS 
  SELECT
    ID,
    PUT(BAR, $HEX32.)
  FROM FOO
WHERE BAR IS NULL; QUIT;

And it outputs ASCII space characters.

 

 

OK, what if the original file has some spaces in it.  Can I just trim it?  Let's see:

 

DATA FOO2(KEEP=ID BAR);
  INFILE '/path/to/csv/file' DELIMITER = ',' MISSOVER DSD LRECL=32767 FIRSTOBS=2;
  INFORMAT ID 10.;
  INFORMAT _BAR :$64.;
  INPUT ID _BAR $;
  BAR = TRIM(_BAR);
RUN;

PROC SQL;
  CREATE TABLE TEST3 AS 
  SELECT
    ID,
    PUT(BAR, $HEX32.)
  FROM FOO2
WHERE BAR IS NULL; QUIT;

Nope, apparently not.  Same deal.  ASCII space characters.  TRIM does nothing.  How do I kindly tell SAS to input this file and give it appropriate values? 

 

 

Frequent Contributor
Posts: 91

Re: Importing a .csv file with a blank column

[ Edited ]
Posted in reply to tomcmacdonald

What if we were to use PRXCHANGE instead of TRIM? 

 

DATA FOO3(KEEP=ID BAR);
  INFILE '/path/to/csv/file' DELIMITER = ',' MISSOVER DSD LRECL=32767 FIRSTOBS=2;
  INFORMAT ID 10.;
  INFORMAT _BAR :$64.;
  INPUT ID _BAR $;
  BAR = PRXCHANGE('s/\s+//', -1, _BAR)
RUN;

PROC SQL;
  CREATE TABLE TEST4 AS 
  SELECT
    ID,
    PUT(BAR, $HEX32.)
  FROM FOO3
WHERE BAR IS NULL; QUIT;

Again, it's a no go.

Frequent Contributor
Posts: 91

Re: Importing a .csv file with a blank column

[ Edited ]
Posted in reply to tomcmacdonald

OK here's my solution

 

DATA FOO4(KEEP=ID BAR);
  INFILE '/path/to/csv/file' DELIMITER = ',' MISSOVER DSD LRECL=32767 FIRSTOBS=2;
  INFORMAT ID 10.;
  INFORMAT BAR :$64.;
  INPUT ID BAR $;
  IF NOT PRXMATCH('/^\s+$/', BAR);
RUN;
Super User
Posts: 23,237

Re: Importing a .csv file with a blank column

Posted in reply to tomcmacdonald

What happens with 

 

  WHERE not missing(bar);

 

Frequent Contributor
Posts: 91

Re: Importing a .csv file with a blank column

Same results.
Super User
Posts: 23,237

Re: Importing a .csv file with a blank column

Posted in reply to tomcmacdonald

You can also use the COMPRESS and all spaces option.

Super User
Posts: 13,293

Re: Importing a .csv file with a blank column

Posted in reply to tomcmacdonald

@tomcmacdonald wrote:

Here's what I'm trying to do:

 

 

Still getting rows with blank BAR columns.

OK, what if the original file has some spaces in it.  Can I just trim it?  Let's see:

 

Nope, apparently not.  Same deal.  ASCII space characters.  TRIM does nothing.  How do I kindly tell SAS to input this file and give it appropriate values? 

 

 


Can you post a few lines of your input data file. Or create data step of your set FOO: Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the {i} icon or attached as text to show exactly what you have and that we can test code against.

 

Frequent Contributor
Posts: 91

Re: Importing a .csv file with a blank column

Here's what the CSV file looks like:

 

id,bar
1, 
2, 
3,Joe
4,Steve
5,Henry
Frequent Contributor
Posts: 91

Re: Importing a .csv file with a blank column

Posted in reply to tomcmacdonald

I feel as though I'm missing something fundamental wrt SAS and whitespace.  Lots of operation apparently don't work.  CATT, TRIM, PRXCHANGE with whitespace regular expressions all yield whitespace and seem to do nothing.  It's a bit jarring coming from a Python, R environment where this isn't even a thought. 

Super User
Posts: 23,237

Re: Importing a .csv file with a blank column

Posted in reply to tomcmacdonald

SAS doesn't have a NULL value.

Frequent Contributor
Posts: 91

Re: Importing a .csv file with a blank column

What does NULL mean in the context of PROC SQL? 

Super User
Posts: 23,237

Re: Importing a .csv file with a blank column

[ Edited ]
Posted in reply to tomcmacdonald

SAS doesn't have a concept of null, regardless of SQL or Data Step.

 

In many RDBMS there is a difference between Null, which means nothing, and a space, whereas SAS does not recognize the difference. Both would return a missing.

 

EDIT: see the response from SAS employee here:

https://communities.sas.com/t5/SAS-Procedures/Missing-Null/td-p/33383

 

 

Respected Advisor
Posts: 4,668

Re: Importing a .csv file with a blank column

Posted in reply to tomcmacdonald

@tomcmacdonald

Below code returns the expected result. What are you doing differently in your code?

data foo3(keep=id bar);
  infile datalines delimiter = ',' missover dsd /*lrecl=32767*/ firstobs=2;
  INFORMAT ID 10.;
  INFORMAT BAR $64.;
  INPUT ID BAR ;
  datalines;
id,bar
1, 
2, 
3,Joe
4,Steve
5,Henry
;
run;

proc sql feedback;
  create table test4 as 
  select
    id,
    bar
  from foo3  
  where bar is not null;
quit;

Please note that using the column modifier in the INFORMAT statement isn't valid syntax. You only use this modifier on the INPUT statement to distinguish formats from same named informats. You also shouldn't use the $ in your input statement as this is going to overwrite your informat definition for BAR.

 

SAS doesn't have a concept of NULL but you still can use the NULL keyword in the SAS SQL flavor. SAS will then treat such a NULL the same as a missing. In SAS SQL the syntax bar is not NULL is equivalent to bar ne ' '

Super User
Super User
Posts: 7,932

Re: Importing a .csv file with a blank column

You only use this modifier on the INPUT statement to distinguish formats from same named informats. You also shouldn't use the $ in your input statement as this is going to overwrite your informat definition for BAR.

The colon modifier on an INPUT statement has nothing to do with formats. It is instructions to INPUT to ignore the width of the informat and instead just read the data using list mode, even if there is an informat specification in the INPUT statement.

 

The $ in the INPUT statement is doing no harm, but it is also not doing any good. The meaning of the bare $ in the INPUT statement is that SAS should default the referenced variable to be character.  But in that data step the both variables had already had their type determined as a side effect of appearing in an earlier INFORMAT statement.

 

Respected Advisor
Posts: 4,668

Re: Importing a .csv file with a blank column

@Tom

Thanks and true, I've got the explanation for the colon modifier used in the INPUT statement wrong.

Nevertheless in the code the OP posted the colon modifier has been used in the INFORMAT statement which to my astonishment didn't even generate a warning in the Log. BUT: It makes a difference to the data type created (numeric) AND using the $ in the INPUT statement in combination does make a difference as well (as now it becomes character).

Using the colon for INFORMAT BAR :$64.; actually behaves as if this statement didn't get executed at all. Play around with it and you'll see.

data foo3(keep=id bar);
  infile datalines delimiter = ',' missover dsd /*lrecl=32767*/ firstobs=2;
  INFORMAT ID 10.;
  INFORMAT BAR :$64.;
  INPUT ID BAR $;
  datalines;
id,bar
1, 
2, 
3,Joe
4,Steve
5,Henry
;
run;
Ask a Question
Discussion stats
  • 16 replies
  • 176 views
  • 0 likes
  • 5 in conversation