BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Junyong
Pyrite | Level 9

This MWE produces too many invalid data notes due to some occasional nulls as follows.

data irx;
	infile 'https://query1.finance.yahoo.com/v7/finance/download/^irx
?period1=-9999999999&period2=9999999999' url firstobs=2 dsd truncover;
	input date yymmdd10. +1 open hi lo close adj vol;
	if adj>. then output;
run;

and the file is

Date,Open,High,Low,Close,Adj Close,Volume
1960-01-04,4.520000,4.520000,4.520000,4.520000,4.520000,0
1960-01-05,4.550000,4.550000,4.550000,4.550000,4.550000,0
1960-01-06,4.680000,4.680000,4.680000,4.680000,4.680000,0
1960-01-07,4.630000,4.630000,4.630000,4.630000,4.630000,0
1960-01-08,4.590000,4.590000,4.590000,4.590000,4.590000,0
1960-01-10,null,null,null,null,null,null
1960-01-11,4.540000,4.540000,4.540000,4.540000,4.540000,0
1960-01-12,4.540000,4.540000,4.540000,4.540000,4.540000,0
1960-01-13,4.560000,4.560000,4.560000,4.560000,4.560000,0
1960-01-14,4.510000,4.510000,4.510000,4.510000,4.510000,0
1960-01-15,4.490000,4.490000,4.490000,4.490000,4.490000,0
1960-01-17,null,null,null,null,null,null
1960-01-18,4.370000,4.370000,4.370000,4.370000,4.370000,0
1960-01-19,4.310000,4.310000,4.310000,4.310000,4.310000,0
1960-01-20,4.300000,4.300000,4.300000,4.300000,4.300000,0
1960-01-21,4.270000,4.270000,4.270000,4.270000,4.270000,0
1960-01-22,4.120000,4.120000,4.120000,4.120000,4.120000,0
1960-01-24,null,null,null,null,null,null
1960-01-25,4.120000,4.120000,4.120000,4.120000,4.120000,0
1960-01-26,4.040000,4.040000,4.040000,4.040000,4.040000,0
1960-01-27,3.990000,3.990000,3.990000,3.990000,3.990000,0
1960-01-28,3.920000,3.920000,3.920000,3.920000,3.920000,0
1960-01-29,3.990000,3.990000,3.990000,3.990000,3.990000,0
1960-01-31,null,null,null,null,null,null

Can I skip these unnecessary observations, instead of reading all and delete them?

1 ACCEPTED SOLUTION

Accepted Solutions
novinosrin
Tourmaline | Level 20

Ah I see what you mean. How about-

data have;
 input @;
 if index(_infile_,'null')=0;
 input var   :mmddyy10.;
  cards;
 null
 06/21/1982
 07/21/1982
 null
 ;

The above doesn't read records from buffer to PDV. Would that suffice?

 

Testing your sample-


data test;
infile cards dsd firstobs=2;
input @;
if index(_infile_,'null')=0;
input date yymmdd10. +1 open hi lo close adj vol;
cards;
Date,Open,High,Low,Close,Adj Close,Volume
1960-01-04,4.520000,4.520000,4.520000,4.520000,4.520000,0
1960-01-05,4.550000,4.550000,4.550000,4.550000,4.550000,0
1960-01-06,4.680000,4.680000,4.680000,4.680000,4.680000,0
1960-01-07,4.630000,4.630000,4.630000,4.630000,4.630000,0
1960-01-08,4.590000,4.590000,4.590000,4.590000,4.590000,0
1960-01-10,null,null,null,null,null,null
1960-01-11,4.540000,4.540000,4.540000,4.540000,4.540000,0
1960-01-12,4.540000,4.540000,4.540000,4.540000,4.540000,0
1960-01-13,4.560000,4.560000,4.560000,4.560000,4.560000,0
1960-01-14,4.510000,4.510000,4.510000,4.510000,4.510000,0
1960-01-15,4.490000,4.490000,4.490000,4.490000,4.490000,0
1960-01-17,null,null,null,null,null,null
1960-01-18,4.370000,4.370000,4.370000,4.370000,4.370000,0
1960-01-19,4.310000,4.310000,4.310000,4.310000,4.310000,0
1960-01-20,4.300000,4.300000,4.300000,4.300000,4.300000,0
1960-01-21,4.270000,4.270000,4.270000,4.270000,4.270000,0
1960-01-22,4.120000,4.120000,4.120000,4.120000,4.120000,0
1960-01-24,null,null,null,null,null,null
1960-01-25,4.120000,4.120000,4.120000,4.120000,4.120000,0
1960-01-26,4.040000,4.040000,4.040000,4.040000,4.040000,0
1960-01-27,3.990000,3.990000,3.990000,3.990000,3.990000,0
1960-01-28,3.920000,3.920000,3.920000,3.920000,3.920000,0
1960-01-29,3.990000,3.990000,3.990000,3.990000,3.990000,0
1960-01-31,null,null,null,null,null,null
;

View solution in original post

11 REPLIES 11
novinosrin
Tourmaline | Level 20
if _error_=0 and adj>.; /*please try*/
Junyong
Pyrite | Level 9

This code also reads all the observations first and then removes the missing observations. Can I not read and completely skip the lines containing the nulls? For example, I tried the following but failed.

data irx;
	infile 'https://query1.finance.yahoo.com/v7/finance/download/^irx
?period1=-9999999999&period2=9999999999' url firstobs=2 dsd truncover;
	if find(_infile_,"null")=0 then input date yymmdd10. +1 open hi lo close adj vol;
run;
novinosrin
Tourmaline | Level 20

Ah I see what you mean. How about-

data have;
 input @;
 if index(_infile_,'null')=0;
 input var   :mmddyy10.;
  cards;
 null
 06/21/1982
 07/21/1982
 null
 ;

The above doesn't read records from buffer to PDV. Would that suffice?

 

Testing your sample-


data test;
infile cards dsd firstobs=2;
input @;
if index(_infile_,'null')=0;
input date yymmdd10. +1 open hi lo close adj vol;
cards;
Date,Open,High,Low,Close,Adj Close,Volume
1960-01-04,4.520000,4.520000,4.520000,4.520000,4.520000,0
1960-01-05,4.550000,4.550000,4.550000,4.550000,4.550000,0
1960-01-06,4.680000,4.680000,4.680000,4.680000,4.680000,0
1960-01-07,4.630000,4.630000,4.630000,4.630000,4.630000,0
1960-01-08,4.590000,4.590000,4.590000,4.590000,4.590000,0
1960-01-10,null,null,null,null,null,null
1960-01-11,4.540000,4.540000,4.540000,4.540000,4.540000,0
1960-01-12,4.540000,4.540000,4.540000,4.540000,4.540000,0
1960-01-13,4.560000,4.560000,4.560000,4.560000,4.560000,0
1960-01-14,4.510000,4.510000,4.510000,4.510000,4.510000,0
1960-01-15,4.490000,4.490000,4.490000,4.490000,4.490000,0
1960-01-17,null,null,null,null,null,null
1960-01-18,4.370000,4.370000,4.370000,4.370000,4.370000,0
1960-01-19,4.310000,4.310000,4.310000,4.310000,4.310000,0
1960-01-20,4.300000,4.300000,4.300000,4.300000,4.300000,0
1960-01-21,4.270000,4.270000,4.270000,4.270000,4.270000,0
1960-01-22,4.120000,4.120000,4.120000,4.120000,4.120000,0
1960-01-24,null,null,null,null,null,null
1960-01-25,4.120000,4.120000,4.120000,4.120000,4.120000,0
1960-01-26,4.040000,4.040000,4.040000,4.040000,4.040000,0
1960-01-27,3.990000,3.990000,3.990000,3.990000,3.990000,0
1960-01-28,3.920000,3.920000,3.920000,3.920000,3.920000,0
1960-01-29,3.990000,3.990000,3.990000,3.990000,3.990000,0
1960-01-31,null,null,null,null,null,null
;
novinosrin
Tourmaline | Level 20

Try the "subsetting if" rather than "If Then"

 

979  data test;
980  infile cards dsd firstobs=2;
981  input @;
982  if index(_infile_,'null')=0;
983  input date yymmdd10. +1 open hi lo close adj vol;
984  cards;

NOTE: The data set WORK.TEST has 20 observations and 7 variables.
NOTE: DATA statement used (Total process time):
      real time           0.01 seconds
      cpu time            0.01 seconds
Junyong
Pyrite | Level 9

Subsetting first was the exact answer. Thanks.

CurtisMackWSIPP
Lapis Lazuli | Level 10

Keep in mind that that solution will drop the record if any of the values are null, not just the "adj" variable.

CurtisMackWSIPP
Lapis Lazuli | Level 10

I like using custom informat to handle these conditions.

 

proc format;
  invalue myfmt "null" = .
                default = 10.5;
run;

data want;
  infile datalines firstobs=2 dsd truncover;
  informat open hi lo close adj vol myfmt.;
  input date yymmdd10. +1 open hi lo close adj vol;
	if adj>. then output;
datalines;
Date,Open,High,Low,Close,Adj Close,Volume
1960-01-04,4.520000,4.520000,4.520000,4.520000,4.520000,0
1960-01-05,4.550000,4.550000,4.550000,4.550000,4.550000,0
1960-01-06,4.680000,4.680000,4.680000,4.680000,4.680000,0
1960-01-07,4.630000,4.630000,4.630000,4.630000,4.630000,0
1960-01-08,4.590000,4.590000,4.590000,4.590000,4.590000,0
1960-01-10,null,null,null,null,null,null
1960-01-11,4.540000,4.540000,4.540000,4.540000,4.540000,0
1960-01-12,4.540000,4.540000,4.540000,4.540000,4.540000,0
1960-01-13,4.560000,4.560000,4.560000,4.560000,4.560000,0
1960-01-14,4.510000,4.510000,4.510000,4.510000,4.510000,0
1960-01-15,4.490000,4.490000,4.490000,4.490000,4.490000,0
1960-01-17,null,null,null,null,null,null
1960-01-18,4.370000,4.370000,4.370000,4.370000,4.370000,0
1960-01-19,4.310000,4.310000,4.310000,4.310000,4.310000,0
1960-01-20,4.300000,4.300000,4.300000,4.300000,4.300000,0
1960-01-21,4.270000,4.270000,4.270000,4.270000,4.270000,0
1960-01-22,4.120000,4.120000,4.120000,4.120000,4.120000,0
1960-01-24,null,null,null,null,null,null
1960-01-25,4.120000,4.120000,4.120000,4.120000,4.120000,0
1960-01-26,4.040000,4.040000,4.040000,4.040000,4.040000,0
1960-01-27,3.990000,3.990000,3.990000,3.990000,3.990000,0
1960-01-28,3.920000,3.920000,3.920000,3.920000,3.920000,0
1960-01-29,3.990000,3.990000,3.990000,3.990000,3.990000,0
1960-01-31,null,null,null,null,null,null
;
run;
Junyong
Pyrite | Level 9

It seems this will be useful for R CSVs, which put NAs for missings. Thanks.

Tom
Super User Tom
Super User
proc format;
  invalue myfmt "null" = .
                default = 10.5;
run;

Did you really mean to set the value to 10.5 when the value is the string 'default'?

If not then what did you think that second line meant?

 

 

 

 

CurtisMackWSIPP
Lapis Lazuli | Level 10

Try it.  You are thinking of the "OTHER" statement.  "DEFAULT" says use this existing format of 10.5. 

Tom
Super User Tom
Super User

@CurtisMackWSIPP wrote:

Try it.  You are thinking of the "OTHER" statement.  "DEFAULT" says use this existing format of 10.5. 


There is no reason to use a different informat than what SAS normally uses to read numbers.  But if you did want to specify the use of an existing informat you need to enclose it square brackets.  But you wouldn't want to use an informat like 10.5 because that imposes an implied decimal point before the last 5 digits for this problem.  So any integer values would be divided by 10**5.

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 11 replies
  • 1258 views
  • 1 like
  • 4 in conversation