Solved: 'a format that is native to another host' and 'Missing values'

France · Posted 09-04-2018 07:25 PM

Dear all,

I am running codes and get following results,

141
142  DATA Step1.Publicationsnew1 ;
143     SET Pat_ori.Publicationsnew ;
NOTE: Data file PAT_ORI.PUBLICATIONSNEW.DATA is in a format that is native to another host, or the
      file encoding does not match the session encoding. Cross Environment Data Access will be used,
      which might require additional CPU resources and might reduce performance.
144     IF YEAR(publn_date) NE 9999 ;
145  RUN;

NOTE: Missing values were generated as a result of performing an operation on missing values.
      Each place is given by: (Number of times) at (Line):(Column).
      4 at 144:7
NOTE: There were 100491575 observations read from the data set PAT_ORI.PUBLICATIONSNEW.
NOTE: The data set STEP1.PUBLICATIONSNEW1 has 97135175 observations and 10 variables.
NOTE: DATA statement used (Total process time):
      real time           14:43.50
      cpu time            3:40.86

Should I rebuild the data set or do something else ? Could you please give me some suggestion?

thanks in advance.

Reeza · Posted 09-04-2018 08:15 PM

You can ignore the CEDA message, it's more informational. If it is bothersome you can create a new data set in your process, you'll see it once and then never again. Regarding missing, find Line 144 in your log (IF YEAR(publn_date) NE 9999) which is generating the warning. You probably have some publn_date values that are missing so when you apply the year function to it, it generates an error. Either add another clause to only do that calculation if it's not missing.

if not missing(publn_date) then year = year(publn_date);
if missing(year) or year ne 9999;

You'll probably have to decide how to handle the missing case, not sure what you want to do there, keep them or delete. Hopefully that gives you the idea.

View solution in original post

Reeza · Posted 09-04-2018 08:15 PM

You can ignore the CEDA message, it's more informational. If it is bothersome you can create a new data set in your process, you'll see it once and then never again. Regarding missing, find Line 144 in your log (IF YEAR(publn_date) NE 9999) which is generating the warning. You probably have some publn_date values that are missing so when you apply the year function to it, it generates an error. Either add another clause to only do that calculation if it's not missing.

if not missing(publn_date) then year = year(publn_date);
if missing(year) or year ne 9999;

You'll probably have to decide how to handle the missing case, not sure what you want to do there, keep them or delete. Hopefully that gives you the idea.

Kurt_Bremser · Posted 09-05-2018 03:18 AM

Regarding CEDA: as it states, the worst that could happen is some performance penalty in the first step where you read it.

Regarding the missings: since the 4 observations will be included in your new dataset, filter them out and inspect them. 4 missings out of ~100 million might also indicate other bad data in those observations.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

'a format that is native to another host' and 'Missing values'

Re: 'a format that is native to another host' and 'Missing values'

Re: 'a format that is native to another host' and 'Missing values'

Re: 'a format that is native to another host' and 'Missing values'

Catch up on SAS Innovate 2026

'a format that is native to another host' and 'Missing values'

Re: 'a format that is native to another host' and 'Missing values'

Re: 'a format that is native to another host' and 'Missing values'

Re: 'a format that is native to another host' and 'Missing values'

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away