This MWE produces too many invalid data note
s due to some occasional null
s as follows.
data irx;
infile 'https://query1.finance.yahoo.com/v7/finance/download/^irx
?period1=-9999999999&period2=9999999999' url firstobs=2 dsd truncover;
input date yymmdd10. +1 open hi lo close adj vol;
if adj>. then output;
run;
and the file is
Date,Open,High,Low,Close,Adj Close,Volume
1960-01-04,4.520000,4.520000,4.520000,4.520000,4.520000,0
1960-01-05,4.550000,4.550000,4.550000,4.550000,4.550000,0
1960-01-06,4.680000,4.680000,4.680000,4.680000,4.680000,0
1960-01-07,4.630000,4.630000,4.630000,4.630000,4.630000,0
1960-01-08,4.590000,4.590000,4.590000,4.590000,4.590000,0
1960-01-10,null,null,null,null,null,null
1960-01-11,4.540000,4.540000,4.540000,4.540000,4.540000,0
1960-01-12,4.540000,4.540000,4.540000,4.540000,4.540000,0
1960-01-13,4.560000,4.560000,4.560000,4.560000,4.560000,0
1960-01-14,4.510000,4.510000,4.510000,4.510000,4.510000,0
1960-01-15,4.490000,4.490000,4.490000,4.490000,4.490000,0
1960-01-17,null,null,null,null,null,null
1960-01-18,4.370000,4.370000,4.370000,4.370000,4.370000,0
1960-01-19,4.310000,4.310000,4.310000,4.310000,4.310000,0
1960-01-20,4.300000,4.300000,4.300000,4.300000,4.300000,0
1960-01-21,4.270000,4.270000,4.270000,4.270000,4.270000,0
1960-01-22,4.120000,4.120000,4.120000,4.120000,4.120000,0
1960-01-24,null,null,null,null,null,null
1960-01-25,4.120000,4.120000,4.120000,4.120000,4.120000,0
1960-01-26,4.040000,4.040000,4.040000,4.040000,4.040000,0
1960-01-27,3.990000,3.990000,3.990000,3.990000,3.990000,0
1960-01-28,3.920000,3.920000,3.920000,3.920000,3.920000,0
1960-01-29,3.990000,3.990000,3.990000,3.990000,3.990000,0
1960-01-31,null,null,null,null,null,null
Can I skip these unnecessary observations, instead of reading all and delete them?
Ah I see what you mean. How about-
data have;
input @;
if index(_infile_,'null')=0;
input var :mmddyy10.;
cards;
null
06/21/1982
07/21/1982
null
;
The above doesn't read records from buffer to PDV. Would that suffice?
Testing your sample-
data test;
infile cards dsd firstobs=2;
input @;
if index(_infile_,'null')=0;
input date yymmdd10. +1 open hi lo close adj vol;
cards;
Date,Open,High,Low,Close,Adj Close,Volume
1960-01-04,4.520000,4.520000,4.520000,4.520000,4.520000,0
1960-01-05,4.550000,4.550000,4.550000,4.550000,4.550000,0
1960-01-06,4.680000,4.680000,4.680000,4.680000,4.680000,0
1960-01-07,4.630000,4.630000,4.630000,4.630000,4.630000,0
1960-01-08,4.590000,4.590000,4.590000,4.590000,4.590000,0
1960-01-10,null,null,null,null,null,null
1960-01-11,4.540000,4.540000,4.540000,4.540000,4.540000,0
1960-01-12,4.540000,4.540000,4.540000,4.540000,4.540000,0
1960-01-13,4.560000,4.560000,4.560000,4.560000,4.560000,0
1960-01-14,4.510000,4.510000,4.510000,4.510000,4.510000,0
1960-01-15,4.490000,4.490000,4.490000,4.490000,4.490000,0
1960-01-17,null,null,null,null,null,null
1960-01-18,4.370000,4.370000,4.370000,4.370000,4.370000,0
1960-01-19,4.310000,4.310000,4.310000,4.310000,4.310000,0
1960-01-20,4.300000,4.300000,4.300000,4.300000,4.300000,0
1960-01-21,4.270000,4.270000,4.270000,4.270000,4.270000,0
1960-01-22,4.120000,4.120000,4.120000,4.120000,4.120000,0
1960-01-24,null,null,null,null,null,null
1960-01-25,4.120000,4.120000,4.120000,4.120000,4.120000,0
1960-01-26,4.040000,4.040000,4.040000,4.040000,4.040000,0
1960-01-27,3.990000,3.990000,3.990000,3.990000,3.990000,0
1960-01-28,3.920000,3.920000,3.920000,3.920000,3.920000,0
1960-01-29,3.990000,3.990000,3.990000,3.990000,3.990000,0
1960-01-31,null,null,null,null,null,null
;
if _error_=0 and adj>.; /*please try*/
This code also reads all the observations first and then removes the missing observations. Can I not read and completely skip the lines containing the null
s? For example, I tried the following but failed.
data irx;
infile 'https://query1.finance.yahoo.com/v7/finance/download/^irx
?period1=-9999999999&period2=9999999999' url firstobs=2 dsd truncover;
if find(_infile_,"null")=0 then input date yymmdd10. +1 open hi lo close adj vol;
run;
Ah I see what you mean. How about-
data have;
input @;
if index(_infile_,'null')=0;
input var :mmddyy10.;
cards;
null
06/21/1982
07/21/1982
null
;
The above doesn't read records from buffer to PDV. Would that suffice?
Testing your sample-
data test;
infile cards dsd firstobs=2;
input @;
if index(_infile_,'null')=0;
input date yymmdd10. +1 open hi lo close adj vol;
cards;
Date,Open,High,Low,Close,Adj Close,Volume
1960-01-04,4.520000,4.520000,4.520000,4.520000,4.520000,0
1960-01-05,4.550000,4.550000,4.550000,4.550000,4.550000,0
1960-01-06,4.680000,4.680000,4.680000,4.680000,4.680000,0
1960-01-07,4.630000,4.630000,4.630000,4.630000,4.630000,0
1960-01-08,4.590000,4.590000,4.590000,4.590000,4.590000,0
1960-01-10,null,null,null,null,null,null
1960-01-11,4.540000,4.540000,4.540000,4.540000,4.540000,0
1960-01-12,4.540000,4.540000,4.540000,4.540000,4.540000,0
1960-01-13,4.560000,4.560000,4.560000,4.560000,4.560000,0
1960-01-14,4.510000,4.510000,4.510000,4.510000,4.510000,0
1960-01-15,4.490000,4.490000,4.490000,4.490000,4.490000,0
1960-01-17,null,null,null,null,null,null
1960-01-18,4.370000,4.370000,4.370000,4.370000,4.370000,0
1960-01-19,4.310000,4.310000,4.310000,4.310000,4.310000,0
1960-01-20,4.300000,4.300000,4.300000,4.300000,4.300000,0
1960-01-21,4.270000,4.270000,4.270000,4.270000,4.270000,0
1960-01-22,4.120000,4.120000,4.120000,4.120000,4.120000,0
1960-01-24,null,null,null,null,null,null
1960-01-25,4.120000,4.120000,4.120000,4.120000,4.120000,0
1960-01-26,4.040000,4.040000,4.040000,4.040000,4.040000,0
1960-01-27,3.990000,3.990000,3.990000,3.990000,3.990000,0
1960-01-28,3.920000,3.920000,3.920000,3.920000,3.920000,0
1960-01-29,3.990000,3.990000,3.990000,3.990000,3.990000,0
1960-01-31,null,null,null,null,null,null
;
Try the "subsetting if" rather than "If Then"
979 data test; 980 infile cards dsd firstobs=2; 981 input @; 982 if index(_infile_,'null')=0; 983 input date yymmdd10. +1 open hi lo close adj vol; 984 cards; NOTE: The data set WORK.TEST has 20 observations and 7 variables. NOTE: DATA statement used (Total process time): real time 0.01 seconds cpu time 0.01 seconds
Subsetting first was the exact answer. Thanks.
Keep in mind that that solution will drop the record if any of the values are null, not just the "adj" variable.
I like using custom informat to handle these conditions.
proc format;
invalue myfmt "null" = .
default = 10.5;
run;
data want;
infile datalines firstobs=2 dsd truncover;
informat open hi lo close adj vol myfmt.;
input date yymmdd10. +1 open hi lo close adj vol;
if adj>. then output;
datalines;
Date,Open,High,Low,Close,Adj Close,Volume
1960-01-04,4.520000,4.520000,4.520000,4.520000,4.520000,0
1960-01-05,4.550000,4.550000,4.550000,4.550000,4.550000,0
1960-01-06,4.680000,4.680000,4.680000,4.680000,4.680000,0
1960-01-07,4.630000,4.630000,4.630000,4.630000,4.630000,0
1960-01-08,4.590000,4.590000,4.590000,4.590000,4.590000,0
1960-01-10,null,null,null,null,null,null
1960-01-11,4.540000,4.540000,4.540000,4.540000,4.540000,0
1960-01-12,4.540000,4.540000,4.540000,4.540000,4.540000,0
1960-01-13,4.560000,4.560000,4.560000,4.560000,4.560000,0
1960-01-14,4.510000,4.510000,4.510000,4.510000,4.510000,0
1960-01-15,4.490000,4.490000,4.490000,4.490000,4.490000,0
1960-01-17,null,null,null,null,null,null
1960-01-18,4.370000,4.370000,4.370000,4.370000,4.370000,0
1960-01-19,4.310000,4.310000,4.310000,4.310000,4.310000,0
1960-01-20,4.300000,4.300000,4.300000,4.300000,4.300000,0
1960-01-21,4.270000,4.270000,4.270000,4.270000,4.270000,0
1960-01-22,4.120000,4.120000,4.120000,4.120000,4.120000,0
1960-01-24,null,null,null,null,null,null
1960-01-25,4.120000,4.120000,4.120000,4.120000,4.120000,0
1960-01-26,4.040000,4.040000,4.040000,4.040000,4.040000,0
1960-01-27,3.990000,3.990000,3.990000,3.990000,3.990000,0
1960-01-28,3.920000,3.920000,3.920000,3.920000,3.920000,0
1960-01-29,3.990000,3.990000,3.990000,3.990000,3.990000,0
1960-01-31,null,null,null,null,null,null
;
run;
It seems this will be useful for R CSVs, which put NAs for missings. Thanks.
proc format; invalue myfmt "null" = . default = 10.5; run;
Did you really mean to set the value to 10.5 when the value is the string 'default'?
If not then what did you think that second line meant?
Try it. You are thinking of the "OTHER" statement. "DEFAULT" says use this existing format of 10.5.
@CurtisMackWSIPP wrote:
Try it. You are thinking of the "OTHER" statement. "DEFAULT" says use this existing format of 10.5.
There is no reason to use a different informat than what SAS normally uses to read numbers. But if you did want to specify the use of an existing informat you need to enclose it square brackets. But you wouldn't want to use an informat like 10.5 because that imposes an implied decimal point before the last 5 digits for this problem. So any integer values would be divided by 10**5.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.