Hi everybody,
is there anybody out there who knows how to cope with variable values which are "_NULL_":
data PROBLEM_SET;
format ID $2. VAR1 $4. VAR2 8. VAR3 $10.;
format VALID_FROM VALID_TO Date9.;
input @1 ID
@3 VAR1
@8 VAR2
@10 VALID_FROM
@17 VALID_TO
@26 VAR3
;
datalines;
AB xxxx 1 _NULL_ 17897 test1
AB xxxx 1 17898 18262 test2
AB xxxx 1 18263 18627 _NULL_
AB xxxx 1 18628 _NULL_ test3
AB xxxx 1 _NULL_ 19272 _NULL_
;
run;
data HAVE;
format ID $2. VAR1 $4. VAR2 8. VAR3 $10.;
format VALID_FROM VALID_TO Date9.;
input @1 ID
@3 VAR1
@8 VAR2
@10 VALID_FROM
@17 VALID_TO
@26 VAR3
;
datalines;
AB xxxx 1 . 17897 test1
AB xxxx 1 17898 18262 test2
AB xxxx 1 18263 18627 _NULL_
AB xxxx 1 18628 . test3
AB xxxx 1 . 19272 _NULL_
;
RUN;
data WANT;
set HAVE;
if VAR3 eq "_NULL_" then VAR3 = "";
RUN;
As you can see, therre are numeric and alphanumeric _NULL_-values in the PROBLEM DATASET, which I want to be adequately replace (meaning: alphanumeric values transform to " " and numeric values to ".").
When using PROC IMPORT I always get errors b/c SAS detects for examle a string, namely the word "_NULL_" in a numeric variable....
Any help would be very much appreciated.
FK1
Read into character, and do a conditional conversion:
data PROBLEM_SET;
format ID $2. VAR1 $4. VAR2 8. VAR3 $10.;
format VALID_FROM VALID_TO Date9.;
input @1 ID
@3 VAR1
@8 VAR2
@10 _VALID_FROM $
@17 _VALID_TO $
@26 VAR3
;
if _VALID_FROM ne '_NULL_' then VALID_FROM = input(_VALID_FROM,best.);
if _VALID_TO ne '_NULL_' then VALID_TO = input(_VALID_TO,best.);
drop _VALID_FROM _VALID_TO;
datalines;
AB xxxx 1 _NULL_ 17897 test1
AB xxxx 1 17898 18262 test2
AB xxxx 1 18263 18627 _NULL_
AB xxxx 1 18628 _NULL_ test3
AB xxxx 1 _NULL_ 19272 _NULL_
;
run;
@FK1 wrote:
Hi everybody,
is there anybody out there who knows how to cope with variable values which are "_NULL_":
As you can see, therre are numeric and alphanumeric _NULL_-values in the PROBLEM DATASET, which I want to be adequately replace (meaning: alphanumeric values transform to " " and numeric values to ".").
When using PROC IMPORT I always get errors b/c SAS detects for examle a string, namely the word "_NULL_" in a numeric variable....
Any help would be very much appreciated.
FK1
Actually the issue could well be that you are relying on a guessing procedure to handle a systemic issue. After the first time you should have know that you have this issue with _null_ and addressed that.
If the files you are importing are of the same layout then a data step to read the code would allow you use custom informats and clean that garbage up at read time.
proc format lib=work ; invalue probnum '_NULL_'=. ; invalue $probchar '_NULL_'=' ' ; run; data PROBLEM_SET; format ID $2. VAR1 $4. VAR2 8. VAR3 $10.; informat VALID_FROM VALID_TO probnum. var3 probchar.; format VALID_FROM VALID_TO Date9.; input @1 ID @3 VAR1 @8 VAR2 @10 VALID_FROM @17 VALID_TO @26 VAR3 ; datalines; AB xxxx 1 _NULL_ 17897 test1 AB xxxx 1 17898 18262 test2 AB xxxx 1 18263 18627 _NULL_ AB xxxx 1 18628 _NULL_ test3 AB xxxx 1 _NULL_ 19272 _NULL_ ; run;
You don't mention the type of file you are importing. If it is a delimited file you can use proc import to generate the skeleton code and then modify it as needed. Copy from the log and paste into the editor, change and save. Change infile statement and data set name for other files.
One way to deal with this is to make your own formats that understand how to treat the string _NULL_.
Something like this would work for your example data.
proc format ;
invalue null '_NULL_','_null_' = . other=_same_;
invalue $null '_NULL_','_null_' = ' ' other=_same_ ;
run;
data PROBLEM_SET;
length ID $2 VAR1 $4 VAR2 8 VALID_FROM VALID_TO 8 VAR3 $10;
format VALID_FROM VALID_TO Date9.;
input (ID -- VAR3) (:null.) ;
datalines;
AB xxxx 1 _NULL_ 17897 test1
AB xxxx 1 17898 18262 test2
AB xxxx 1 18263 18627 _NULL_
AB xxxx 1 18628 _NULL_ test3
AB xxxx 1 _NULL_ 19272 _NULL_
;
proc print; run;
PS There is no need to attach $xx. formats to character variables. SAS already knows how to print character variables (and numbers too). Use the LENGTH or ATTRIB function if you want to define your variables instead of forcing SAS to define them as a side effect of using having a FORMAT statement be the first place your variable is mentioned.
Hello @Tom,
thanks a lot, for this elegant solution. Also, thanks for the elaboration on the LENGTH / FORMAT statement.
I do have a follow-up question, however:
Am I assuming correct, that this line of your code
input (ID -- VAR4) (:null.) ;
assigns the created invlaue checks ("null" and "$null")?
How does SAS know when to use the invalue-check without the $ sign for numbers variables and when to use the one with the $ sign for the character variables?
It is a "feature" of SAS. If you use the wrong type of format/informat SAS just fixes it for you. Mainly in PUT/INPUT statements I think.
2646 data _null_; 2647 set sashelp.class (obs=2); 2648 put name best. ; ----- 484 NOTE 484-185: Format $BEST was not found or could not be loaded. 2649 run; Alfred Alice NOTE: There were 2 observations read from the data set SASHELP.CLASS.
By naming the formats the same, except for the $ prefix, I was able to take advantage of it.
In your real program you would probably want to use the right format for each specific variable.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.