DATA Step, Macro, Functions and more

Replace _NULL_ values

Reply
Contributor FK1
Contributor
Posts: 37

Replace _NULL_ values

Hi everybody,

 

is there anybody out there who knows how to cope with variable values which are "_NULL_":

 

data PROBLEM_SET;
format ID $2. VAR1 $4. VAR2 8. VAR3 $10.;
format VALID_FROM VALID_TO  Date9.;
input     @1  ID  
        @3    VAR1  
        @8  VAR2  
        @10 VALID_FROM 
        @17 VALID_TO
		@26 VAR3
;
datalines;
AB xxxx 1 _NULL_ 17897   test1
AB xxxx 1 17898  18262   test2
AB xxxx 1 18263  18627   _NULL_
AB xxxx 1 18628  _NULL_  test3
AB xxxx 1 _NULL_ 19272   _NULL_

;
run;



data HAVE;
format ID $2. VAR1 $4. VAR2 8. VAR3 $10.;
format VALID_FROM VALID_TO  Date9.;
input     @1  ID  
        @3    VAR1  
        @8  VAR2  
        @10 VALID_FROM 
        @17 VALID_TO
		@26 VAR3 
;
datalines;
AB xxxx 1 .      17897   test1
AB xxxx 1 17898  18262   test2
AB xxxx 1 18263  18627   _NULL_
AB xxxx 1 18628  .       test3
AB xxxx 1 .      19272   _NULL_

;
RUN;

data WANT;
set HAVE;
if VAR3 eq "_NULL_" then VAR3 = "";
RUN;

As you can see, therre are numeric and alphanumeric _NULL_-values in the PROBLEM DATASET, which I want to be adequately replace (meaning: alphanumeric values transform to " " and numeric values to ".").

 

When using PROC IMPORT I always get errors b/c SAS detects for examle a string, namely the word "_NULL_" in a numeric variable....

 

Any help would be very much appreciated.

 

FK1

Super User
Posts: 9,612

Re: Replace _NULL_ values

Read into character, and do a conditional conversion:

data PROBLEM_SET;
format ID $2. VAR1 $4. VAR2 8. VAR3 $10.;
format VALID_FROM VALID_TO  Date9.;
input     @1  ID  
        @3    VAR1  
        @8  VAR2  
        @10 _VALID_FROM $
        @17 _VALID_TO $
		@26 VAR3
;
if _VALID_FROM ne '_NULL_' then VALID_FROM = input(_VALID_FROM,best.);
if _VALID_TO ne '_NULL_' then VALID_TO = input(_VALID_TO,best.);
drop _VALID_FROM _VALID_TO;
datalines;
AB xxxx 1 _NULL_ 17897   test1
AB xxxx 1 17898  18262   test2
AB xxxx 1 18263  18627   _NULL_
AB xxxx 1 18628  _NULL_  test3
AB xxxx 1 _NULL_ 19272   _NULL_

;
run;
---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
How to post code
Super User
Posts: 13,084

Re: Replace _NULL_ values


FK1 wrote:

Hi everybody,

 

is there anybody out there who knows how to cope with variable values which are "_NULL_":

 

As you can see, therre are numeric and alphanumeric _NULL_-values in the PROBLEM DATASET, which I want to be adequately replace (meaning: alphanumeric values transform to " " and numeric values to ".").

 

When using PROC IMPORT I always get errors b/c SAS detects for examle a string, namely the word "_NULL_" in a numeric variable....

 

Any help would be very much appreciated.

 

FK1


Actually the issue could well be that you are relying on a guessing procedure to handle a systemic issue. After the first time you should have know that you have this issue with _null_ and addressed that.

If the files you are importing are of the same layout then a data step to read the code would allow you use custom informats and clean that garbage up at read time.

proc format lib=work ;
invalue probnum
'_NULL_'=.
;
invalue $probchar
'_NULL_'=' '
;
run;
data PROBLEM_SET;
format ID $2. VAR1 $4. VAR2 8. VAR3 $10.;
informat  VALID_FROM VALID_TO  probnum. var3 probchar.;
format VALID_FROM VALID_TO  Date9.;
input    @1  ID  
        @3    VAR1  
        @8  VAR2  
        @10 VALID_FROM 
        @17 VALID_TO
		  @26 VAR3
;
datalines;
AB xxxx 1 _NULL_ 17897   test1
AB xxxx 1 17898  18262   test2
AB xxxx 1 18263  18627   _NULL_
AB xxxx 1 18628  _NULL_  test3
AB xxxx 1 _NULL_ 19272   _NULL_

;
run;

You don't mention the type of file you are importing. If it is a delimited file you can use proc import to generate the skeleton code and then modify it as needed. Copy from the log and paste into the editor, change and save. Change infile statement and data set name for other files.

 

Super User
Super User
Posts: 7,860

Re: Replace _NULL_ values

One way to deal with this is to make your own formats that understand how to treat the string _NULL_.

Something like this would work for your example data.

proc format ;
invalue null '_NULL_','_null_' = . other=_same_;
invalue $null '_NULL_','_null_' = ' ' other=_same_ ;
run;

data PROBLEM_SET;
length ID $2 VAR1 $4 VAR2 8 VALID_FROM VALID_TO 8 VAR3 $10;
format VALID_FROM VALID_TO  Date9.;
input (ID -- VAR3) (:null.) ;
datalines;
AB xxxx 1 _NULL_ 17897   test1
AB xxxx 1 17898  18262   test2
AB xxxx 1 18263  18627   _NULL_
AB xxxx 1 18628  _NULL_  test3
AB xxxx 1 _NULL_ 19272   _NULL_
;
proc print; run;

 

PS There is no need to attach $xx. formats to character variables. SAS already knows how to print character variables (and numbers too).  Use the LENGTH or ATTRIB function if you want to define your variables instead of forcing SAS to define them as a side effect of using having a FORMAT statement be the first place your variable is mentioned.

 

Contributor FK1
Contributor
Posts: 37

Re: Replace _NULL_ values

[ Edited ]

Hello @Tom,

thanks a lot, for this elegant solution. Also, thanks for the elaboration on the LENGTH / FORMAT statement.
I do have a follow-up question, however:
Am I assuming correct, that this line of your code

 

input (ID -- VAR4) (:null.) ;

assigns the created invlaue checks ("null" and "$null")?

 

How does SAS know when to use the invalue-check without the $ sign for numbers variables and when to use the one with the $ sign for the character variables?

 

 

Super User
Super User
Posts: 7,860

Re: Replace _NULL_ values

It is a "feature" of SAS.  If you use the wrong type of format/informat SAS just fixes it for you.  Mainly in PUT/INPUT statements I think.

2646  data _null_;
2647    set sashelp.class (obs=2);
2648    put name best. ;
                 -----
                 484
NOTE 484-185: Format $BEST was not found or could not be loaded.

2649  run;

Alfred
Alice
NOTE: There were 2 observations read from the data set SASHELP.CLASS.

By naming the formats the same, except for the $ prefix, I was able to take advantage of it.

In your real program you would probably want to use the right format for each specific variable.

Ask a Question
Discussion stats
  • 5 replies
  • 101 views
  • 1 like
  • 4 in conversation