BookmarkSubscribeRSS Feed
FK1
Lapis Lazuli | Level 10 FK1
Lapis Lazuli | Level 10

Hi everybody,

 

is there anybody out there who knows how to cope with variable values which are "_NULL_":

 

data PROBLEM_SET;
format ID $2. VAR1 $4. VAR2 8. VAR3 $10.;
format VALID_FROM VALID_TO  Date9.;
input     @1  ID  
        @3    VAR1  
        @8  VAR2  
        @10 VALID_FROM 
        @17 VALID_TO
		@26 VAR3
;
datalines;
AB xxxx 1 _NULL_ 17897   test1
AB xxxx 1 17898  18262   test2
AB xxxx 1 18263  18627   _NULL_
AB xxxx 1 18628  _NULL_  test3
AB xxxx 1 _NULL_ 19272   _NULL_

;
run;



data HAVE;
format ID $2. VAR1 $4. VAR2 8. VAR3 $10.;
format VALID_FROM VALID_TO  Date9.;
input     @1  ID  
        @3    VAR1  
        @8  VAR2  
        @10 VALID_FROM 
        @17 VALID_TO
		@26 VAR3 
;
datalines;
AB xxxx 1 .      17897   test1
AB xxxx 1 17898  18262   test2
AB xxxx 1 18263  18627   _NULL_
AB xxxx 1 18628  .       test3
AB xxxx 1 .      19272   _NULL_

;
RUN;

data WANT;
set HAVE;
if VAR3 eq "_NULL_" then VAR3 = "";
RUN;

As you can see, therre are numeric and alphanumeric _NULL_-values in the PROBLEM DATASET, which I want to be adequately replace (meaning: alphanumeric values transform to " " and numeric values to ".").

 

When using PROC IMPORT I always get errors b/c SAS detects for examle a string, namely the word "_NULL_" in a numeric variable....

 

Any help would be very much appreciated.

 

FK1

5 REPLIES 5
Kurt_Bremser
Super User

Read into character, and do a conditional conversion:

data PROBLEM_SET;
format ID $2. VAR1 $4. VAR2 8. VAR3 $10.;
format VALID_FROM VALID_TO  Date9.;
input     @1  ID  
        @3    VAR1  
        @8  VAR2  
        @10 _VALID_FROM $
        @17 _VALID_TO $
		@26 VAR3
;
if _VALID_FROM ne '_NULL_' then VALID_FROM = input(_VALID_FROM,best.);
if _VALID_TO ne '_NULL_' then VALID_TO = input(_VALID_TO,best.);
drop _VALID_FROM _VALID_TO;
datalines;
AB xxxx 1 _NULL_ 17897   test1
AB xxxx 1 17898  18262   test2
AB xxxx 1 18263  18627   _NULL_
AB xxxx 1 18628  _NULL_  test3
AB xxxx 1 _NULL_ 19272   _NULL_

;
run;
ballardw
Super User

@FK1 wrote:

Hi everybody,

 

is there anybody out there who knows how to cope with variable values which are "_NULL_":

 

As you can see, therre are numeric and alphanumeric _NULL_-values in the PROBLEM DATASET, which I want to be adequately replace (meaning: alphanumeric values transform to " " and numeric values to ".").

 

When using PROC IMPORT I always get errors b/c SAS detects for examle a string, namely the word "_NULL_" in a numeric variable....

 

Any help would be very much appreciated.

 

FK1


Actually the issue could well be that you are relying on a guessing procedure to handle a systemic issue. After the first time you should have know that you have this issue with _null_ and addressed that.

If the files you are importing are of the same layout then a data step to read the code would allow you use custom informats and clean that garbage up at read time.

proc format lib=work ;
invalue probnum
'_NULL_'=.
;
invalue $probchar
'_NULL_'=' '
;
run;
data PROBLEM_SET;
format ID $2. VAR1 $4. VAR2 8. VAR3 $10.;
informat  VALID_FROM VALID_TO  probnum. var3 probchar.;
format VALID_FROM VALID_TO  Date9.;
input    @1  ID  
        @3    VAR1  
        @8  VAR2  
        @10 VALID_FROM 
        @17 VALID_TO
		  @26 VAR3
;
datalines;
AB xxxx 1 _NULL_ 17897   test1
AB xxxx 1 17898  18262   test2
AB xxxx 1 18263  18627   _NULL_
AB xxxx 1 18628  _NULL_  test3
AB xxxx 1 _NULL_ 19272   _NULL_

;
run;

You don't mention the type of file you are importing. If it is a delimited file you can use proc import to generate the skeleton code and then modify it as needed. Copy from the log and paste into the editor, change and save. Change infile statement and data set name for other files.

 

Tom
Super User Tom
Super User

One way to deal with this is to make your own formats that understand how to treat the string _NULL_.

Something like this would work for your example data.

proc format ;
invalue null '_NULL_','_null_' = . other=_same_;
invalue $null '_NULL_','_null_' = ' ' other=_same_ ;
run;

data PROBLEM_SET;
length ID $2 VAR1 $4 VAR2 8 VALID_FROM VALID_TO 8 VAR3 $10;
format VALID_FROM VALID_TO  Date9.;
input (ID -- VAR3) (:null.) ;
datalines;
AB xxxx 1 _NULL_ 17897   test1
AB xxxx 1 17898  18262   test2
AB xxxx 1 18263  18627   _NULL_
AB xxxx 1 18628  _NULL_  test3
AB xxxx 1 _NULL_ 19272   _NULL_
;
proc print; run;

 

PS There is no need to attach $xx. formats to character variables. SAS already knows how to print character variables (and numbers too).  Use the LENGTH or ATTRIB function if you want to define your variables instead of forcing SAS to define them as a side effect of using having a FORMAT statement be the first place your variable is mentioned.

 

FK1
Lapis Lazuli | Level 10 FK1
Lapis Lazuli | Level 10

Hello @Tom,

thanks a lot, for this elegant solution. Also, thanks for the elaboration on the LENGTH / FORMAT statement.
I do have a follow-up question, however:
Am I assuming correct, that this line of your code

 

input (ID -- VAR4) (:null.) ;

assigns the created invlaue checks ("null" and "$null")?

 

How does SAS know when to use the invalue-check without the $ sign for numbers variables and when to use the one with the $ sign for the character variables?

 

 

Tom
Super User Tom
Super User

It is a "feature" of SAS.  If you use the wrong type of format/informat SAS just fixes it for you.  Mainly in PUT/INPUT statements I think.

2646  data _null_;
2647    set sashelp.class (obs=2);
2648    put name best. ;
                 -----
                 484
NOTE 484-185: Format $BEST was not found or could not be loaded.

2649  run;

Alfred
Alice
NOTE: There were 2 observations read from the data set SASHELP.CLASS.

By naming the formats the same, except for the $ prefix, I was able to take advantage of it.

In your real program you would probably want to use the right format for each specific variable.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 991 views
  • 1 like
  • 4 in conversation