This question continues from my previous Skipping Invalid Lines post. I have multiple data as follows.
Date,Open,High,Low,Close,Adj Close,Volume
1960-01-04,4.520000,4.520000,4.520000,4.520000,4.520000,0
1960-01-05,4.550000,4.550000,4.550000,4.550000,4.550000,0
1960-01-06,4.680000,4.680000,4.680000,4.680000,4.680000,0
1960-01-07,4.630000,4.630000,4.630000,4.630000,4.630000,0
1960-01-08,4.590000,4.590000,4.590000,4.590000,4.590000,0
1960-01-10,null,null,null,null,null,null
1960-01-11,4.540000,4.540000,4.540000,4.540000,4.540000,0
1960-01-12,4.540000,4.540000,4.540000,4.540000,4.540000,0
1960-01-13,4.560000,4.560000,4.560000,4.560000,4.560000,0
1960-01-14,4.510000,4.510000,4.510000,4.510000,4.510000,0
1960-01-15,4.490000,4.490000,4.490000,4.490000,4.490000,0
1960-01-17,null,null,null,null,null,null
1960-01-18,4.370000,4.370000,4.370000,4.370000,4.370000,0
1960-01-19,4.310000,4.310000,4.310000,4.310000,4.310000,0
Date,Open,High,Low,Close,Adj Close,Volume
1927-12-30,17.660000,17.660000,17.660000,17.660000,17.660000,0
1928-01-03,17.760000,17.760000,17.760000,17.760000,17.760000,0
1928-01-04,17.719999,17.719999,17.719999,17.719999,17.719999,0
1928-01-05,17.549999,17.549999,17.549999,17.549999,17.549999,0
1928-01-06,17.660000,17.660000,17.660000,17.660000,17.660000,0
1928-01-09,17.500000,17.500000,17.500000,17.500000,17.500000,0
1928-01-10,17.370001,17.370001,17.370001,17.370001,17.370001,0
1928-01-11,17.350000,17.350000,17.350000,17.350000,17.350000,0
1928-01-12,17.469999,17.469999,17.469999,17.469999,17.469999,0
1928-01-13,17.580000,17.580000,17.580000,17.580000,17.580000,0
1928-01-16,17.290001,17.290001,17.290001,17.290001,17.290001,0
1928-01-17,17.299999,17.299999,17.299999,17.299999,17.299999,0
1928-01-18,17.260000,17.260000,17.260000,17.260000,17.260000,0
1928-01-19,17.379999,17.379999,17.379999,17.379999,17.379999,0
1928-01-20,17.480000,17.480000,17.480000,17.480000,17.480000,0
1928-01-23,17.639999,17.639999,17.639999,17.639999,17.639999,0
1928-01-24,17.709999,17.709999,17.709999,17.709999,17.709999,0
1928-01-25,17.520000,17.520000,17.520000,17.520000,17.520000,0
1928-01-26,17.629999,17.629999,17.629999,17.629999,17.629999,0
1928-01-27,17.690001,17.690001,17.690001,17.690001,17.690001,0
1928-01-30,17.490000,17.490000,17.490000,17.490000,17.490000,0
1928-01-31,17.570000,17.570000,17.570000,17.570000,17.570000,0
1928-02-01,17.530001,17.530001,17.530001,17.530001,17.530001,0
I want to not read and skip the ugly null
s in the first data, and this can be done by only importing non-null
lines from _infile_
as follows.
data irx;
infile 'https://query1.finance.yahoo.com/v7/finance/download/^irx
?period1=-9999999999&period2=9999999999' url firstobs=2 dsd truncover;
input @;
if index(_infile_,"null")=0;
input date yymmdd10. +1 open hi lo close adj vol;
run;
But I'm now importing multiple files together using filevar
option as follows.
data all;
input id $ @@;
url=compress("https://query1.finance.yahoo.com/v7/finance/download/"||id||
"?period1="||dhms("1jan1901"d,0,0,0)-315619200||
'&period2='||dhms("31dec2100"d,23,59,59)-315619200);
infile dummy url filevar=url firstobs=2 dsd truncover end=last;
do until(last);
input @;
if index(_infile_,"null")=0;
input date yymmdd10. +1 open hi lo close adj vol;
output;
end;
cards;
^irx ^gspc
;
and this code reads only the first five lines from the first data—before encountering the null
at the sixth line. It seems input @;
and if index(_infile_,"null")=0;
somehow affects variable last
, but I'm not sure. How should I detour this problem?
I was right about the input not being closed. This works as well.
data all;
input id $ @@;
url=compress("https://query1.finance.yahoo.com/v7/finance/download/"||id||
"?period1="||dhms("1jan1901"d,0,0,0)-315619200||
'&period2='||dhms("31dec2100"d,23,59,59)-315619200);
infile dummy url filevar=url firstobs=2 dsd truncover end=last;
do until(last);
input @;
if index(_infile_,"null")=0 then do;
input date yymmdd10. +1 open hi lo close adj vol;
output;
end;
else input;
end;
cards;
^irx ^gspc
;
I suspect the open input statement is triggering that last condition.
The custom INFORMAT solution I mentioned last time works great.
proc format;
invalue myfmt "null" = .
default = 10.5;
run;
data all;
input id $ @@;
url=compress("https://query1.finance.yahoo.com/v7/finance/download/"||id||
"?period1="||dhms("1jan1901"d,0,0,0)-315619200||
'&period2='||dhms("31dec2100"d,23,59,59)-315619200);
informat open hi lo close adj vol myfmt.;
infile dummy url filevar=url firstobs=2 dsd truncover end=last;
do until(last);
input date yymmdd10. +1 open hi lo close adj vol;
if adj>. then output;
end;
cards;
^irx ^gspc
;
Are you sure you don't want something more like:
data irx; infile 'https://query1.finance.yahoo.com/v7/finance/download/^irx ?period1=-9999999999&period2=9999999999' url firstobs=2 dsd truncover; input @; if index(_infile_,"null")=0 then do; input date yymmdd10. +1 open hi lo close adj vol; output; end; else input; run;
One of things about using Input @; and holding the input pointer on a line is that at some point you need to release the pointer, or at least should.
Not a fan of obscure magic numbers as appear here:
dhms("1jan1901"d,0,0,0)-315619200
I would suggest using something with the INTNX function instead of -315619200 so someone reading the code later has better chance of understanding what that offset might actually mean.
Funny, we posted the same exact code at the same time.
@CurtisMackWSIPP wrote:
Funny, we posted the same exact code at the same time.
Great minds and all that. 😀
1. I put the INPUT statement to use _INFILE_ in that way—I have been using SAS quite a long time, but DATA steps rather than other PROCs are always most challenging. Thanks.
2. The number was based on Calculating the UTC offset in your SAS session, which shows 315,619,200=3653*24*60*60. I wanted future readers to easily adjust the span based on just the simple dates, but it seems I need to think of more readable alternatives.
@Junyong wrote:
1. I put the INPUT statement to use _INFILE_ in that way—I have been using SAS quite a long time, but DATA steps rather than other PROCs are always most challenging. Thanks.
2. The number was based on Calculating the UTC offset in your SAS session, which shows 315,619,200=3653*24*60*60. I wanted future readers to easily adjust the span based on just the simple dates, but it seems I need to think of more readable alternatives.
So is that supposed to be a 10 year offset?
Please tell me exactly what the resulting datetime values are supposed to be for that result.
Your offset for the first data yields 31Dec1890:00:00:00 and the second 30Dec2090:00:00:00, so misses the last day of 2090.
I will guess that you are just setting a range that is supposed to get "everything" and have provided boundaries actually outside the range of values available so for this you don't care that your are missing the 31DEC2090. But that general approach will get you in trouble with date/clock times for other purposes. Not to mention that when the next guy has to use your code an change it there isn't any documentation as to what that offset represents or why.
I was right about the input not being closed. This works as well.
data all;
input id $ @@;
url=compress("https://query1.finance.yahoo.com/v7/finance/download/"||id||
"?period1="||dhms("1jan1901"d,0,0,0)-315619200||
'&period2='||dhms("31dec2100"d,23,59,59)-315619200);
infile dummy url filevar=url firstobs=2 dsd truncover end=last;
do until(last);
input @;
if index(_infile_,"null")=0 then do;
input date yymmdd10. +1 open hi lo close adj vol;
output;
end;
else input;
end;
cards;
^irx ^gspc
;
Yes, the bottleneck was the unclosed INPUT. I think this IF-ELSE approach will be more stable than only the IF subsetting. Thanks again for your teaching.
Glad it worked and thanks for the accepted solution. But I think the INFORMAT solution is more stable. As I said before. The accepted approach will delete the row if any value is "null", not just one value as your original request was trying to do.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.