BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
France
Quartz | Level 8

 

Dear all,

 

I am running codes and get following results, 

 

141
142  DATA Step1.Publicationsnew1 ;
143     SET Pat_ori.Publicationsnew ;
NOTE: Data file PAT_ORI.PUBLICATIONSNEW.DATA is in a format that is native to another host, or the
      file encoding does not match the session encoding. Cross Environment Data Access will be used,
      which might require additional CPU resources and might reduce performance.
144     IF YEAR(publn_date) NE 9999 ;
145  RUN;

NOTE: Missing values were generated as a result of performing an operation on missing values.
      Each place is given by: (Number of times) at (Line):(Column).
      4 at 144:7
NOTE: There were 100491575 observations read from the data set PAT_ORI.PUBLICATIONSNEW.
NOTE: The data set STEP1.PUBLICATIONSNEW1 has 97135175 observations and 10 variables.
NOTE: DATA statement used (Total process time):
      real time           14:43.50
      cpu time            3:40.86


Should I rebuild the data set or do something else ? Could you please give me some suggestion?

 

thanks in advance.

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User
You can ignore the CEDA message, it's more informational. If it is bothersome you can create a new data set in your process, you'll see it once and then never again. Regarding missing, find Line 144 in your log (IF YEAR(publn_date) NE 9999) which is generating the warning. You probably have some publn_date values that are missing so when you apply the year function to it, it generates an error. Either add another clause to only do that calculation if it's not missing.

if not missing(publn_date) then year = year(publn_date);
if missing(year) or year ne 9999;

You'll probably have to decide how to handle the missing case, not sure what you want to do there, keep them or delete. Hopefully that gives you the idea.

View solution in original post

2 REPLIES 2
Reeza
Super User
You can ignore the CEDA message, it's more informational. If it is bothersome you can create a new data set in your process, you'll see it once and then never again. Regarding missing, find Line 144 in your log (IF YEAR(publn_date) NE 9999) which is generating the warning. You probably have some publn_date values that are missing so when you apply the year function to it, it generates an error. Either add another clause to only do that calculation if it's not missing.

if not missing(publn_date) then year = year(publn_date);
if missing(year) or year ne 9999;

You'll probably have to decide how to handle the missing case, not sure what you want to do there, keep them or delete. Hopefully that gives you the idea.
Kurt_Bremser
Super User

Regarding CEDA: as it states, the worst that could happen is some performance penalty in the first step where you read it.

Regarding the missings: since the 4 observations will be included in your new dataset, filter them out and inspect them. 4 missings out of ~100 million might also indicate other bad data in those observations.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 2 replies
  • 3960 views
  • 1 like
  • 3 in conversation