BookmarkSubscribeRSS Feed
tomcmacdonald
Quartz | Level 8

Here's what I'm trying to do:

 

 

DATA FOO;
  INFILE '/path/to/csv/file' DELIMITER = ',' MISSOVER DSD LRECL=32767 FIRSTOBS=2;
  INFORMAT ID 10.;
  INFORMAT BAR :$64.;
  INPUT ID BAR $;
RUN;

PROC SQL;
  CREATE TABLE TEST AS 
  SELECT *
  FROM FOO
  WHERE BAR IS NULL;
QUIT;

Still getting rows with blank BAR columns.

 

 

Checking to see what's in those columns I do this:

 

 

PROC SQL;
  CREATE TABLE TEST2 AS 
  SELECT
    ID,
    PUT(BAR, $HEX32.)
  FROM FOO
WHERE BAR IS NULL; QUIT;

And it outputs ASCII space characters.

 

 

OK, what if the original file has some spaces in it.  Can I just trim it?  Let's see:

 

DATA FOO2(KEEP=ID BAR);
  INFILE '/path/to/csv/file' DELIMITER = ',' MISSOVER DSD LRECL=32767 FIRSTOBS=2;
  INFORMAT ID 10.;
  INFORMAT _BAR :$64.;
  INPUT ID _BAR $;
  BAR = TRIM(_BAR);
RUN;

PROC SQL;
  CREATE TABLE TEST3 AS 
  SELECT
    ID,
    PUT(BAR, $HEX32.)
  FROM FOO2
WHERE BAR IS NULL; QUIT;

Nope, apparently not.  Same deal.  ASCII space characters.  TRIM does nothing.  How do I kindly tell SAS to input this file and give it appropriate values? 

 

 

16 REPLIES 16
tomcmacdonald
Quartz | Level 8

What if we were to use PRXCHANGE instead of TRIM? 

 

DATA FOO3(KEEP=ID BAR);
  INFILE '/path/to/csv/file' DELIMITER = ',' MISSOVER DSD LRECL=32767 FIRSTOBS=2;
  INFORMAT ID 10.;
  INFORMAT _BAR :$64.;
  INPUT ID _BAR $;
  BAR = PRXCHANGE('s/\s+//', -1, _BAR)
RUN;

PROC SQL;
  CREATE TABLE TEST4 AS 
  SELECT
    ID,
    PUT(BAR, $HEX32.)
  FROM FOO3
WHERE BAR IS NULL; QUIT;

Again, it's a no go.

tomcmacdonald
Quartz | Level 8

OK here's my solution

 

DATA FOO4(KEEP=ID BAR);
  INFILE '/path/to/csv/file' DELIMITER = ',' MISSOVER DSD LRECL=32767 FIRSTOBS=2;
  INFORMAT ID 10.;
  INFORMAT BAR :$64.;
  INPUT ID BAR $;
  IF NOT PRXMATCH('/^\s+$/', BAR);
RUN;
Reeza
Super User

What happens with 

 

  WHERE not missing(bar);

 

tomcmacdonald
Quartz | Level 8
Same results.
Reeza
Super User

You can also use the COMPRESS and all spaces option.

ballardw
Super User

@tomcmacdonald wrote:

Here's what I'm trying to do:

 

 

Still getting rows with blank BAR columns.

OK, what if the original file has some spaces in it.  Can I just trim it?  Let's see:

 

Nope, apparently not.  Same deal.  ASCII space characters.  TRIM does nothing.  How do I kindly tell SAS to input this file and give it appropriate values? 

 

 


Can you post a few lines of your input data file. Or create data step of your set FOO: Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the {i} icon or attached as text to show exactly what you have and that we can test code against.

 

tomcmacdonald
Quartz | Level 8

Here's what the CSV file looks like:

 

id,bar
1, 
2, 
3,Joe
4,Steve
5,Henry
tomcmacdonald
Quartz | Level 8

I feel as though I'm missing something fundamental wrt SAS and whitespace.  Lots of operation apparently don't work.  CATT, TRIM, PRXCHANGE with whitespace regular expressions all yield whitespace and seem to do nothing.  It's a bit jarring coming from a Python, R environment where this isn't even a thought. 

Reeza
Super User

SAS doesn't have a NULL value.

tomcmacdonald
Quartz | Level 8

What does NULL mean in the context of PROC SQL? 

Reeza
Super User

SAS doesn't have a concept of null, regardless of SQL or Data Step.

 

In many RDBMS there is a difference between Null, which means nothing, and a space, whereas SAS does not recognize the difference. Both would return a missing.

 

EDIT: see the response from SAS employee here:

https://communities.sas.com/t5/SAS-Procedures/Missing-Null/td-p/33383

 

 

Patrick
Opal | Level 21

@tomcmacdonald

Below code returns the expected result. What are you doing differently in your code?

data foo3(keep=id bar);
  infile datalines delimiter = ',' missover dsd /*lrecl=32767*/ firstobs=2;
  INFORMAT ID 10.;
  INFORMAT BAR $64.;
  INPUT ID BAR ;
  datalines;
id,bar
1, 
2, 
3,Joe
4,Steve
5,Henry
;
run;

proc sql feedback;
  create table test4 as 
  select
    id,
    bar
  from foo3  
  where bar is not null;
quit;

Please note that using the column modifier in the INFORMAT statement isn't valid syntax. You only use this modifier on the INPUT statement to distinguish formats from same named informats. You also shouldn't use the $ in your input statement as this is going to overwrite your informat definition for BAR.

 

SAS doesn't have a concept of NULL but you still can use the NULL keyword in the SAS SQL flavor. SAS will then treat such a NULL the same as a missing. In SAS SQL the syntax bar is not NULL is equivalent to bar ne ' '

Tom
Super User Tom
Super User

You only use this modifier on the INPUT statement to distinguish formats from same named informats. You also shouldn't use the $ in your input statement as this is going to overwrite your informat definition for BAR.

The colon modifier on an INPUT statement has nothing to do with formats. It is instructions to INPUT to ignore the width of the informat and instead just read the data using list mode, even if there is an informat specification in the INPUT statement.

 

The $ in the INPUT statement is doing no harm, but it is also not doing any good. The meaning of the bare $ in the INPUT statement is that SAS should default the referenced variable to be character.  But in that data step the both variables had already had their type determined as a side effect of appearing in an earlier INFORMAT statement.

 

Patrick
Opal | Level 21

@Tom

Thanks and true, I've got the explanation for the colon modifier used in the INPUT statement wrong.

Nevertheless in the code the OP posted the colon modifier has been used in the INFORMAT statement which to my astonishment didn't even generate a warning in the Log. BUT: It makes a difference to the data type created (numeric) AND using the $ in the INPUT statement in combination does make a difference as well (as now it becomes character).

Using the colon for INFORMAT BAR :$64.; actually behaves as if this statement didn't get executed at all. Play around with it and you'll see.

data foo3(keep=id bar);
  infile datalines delimiter = ',' missover dsd /*lrecl=32767*/ firstobs=2;
  INFORMAT ID 10.;
  INFORMAT BAR :$64.;
  INPUT ID BAR $;
  datalines;
id,bar
1, 
2, 
3,Joe
4,Steve
5,Henry
;
run;

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 16 replies
  • 2242 views
  • 0 likes
  • 5 in conversation