BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Junyong
Pyrite | Level 9

This question continues from my previous Skipping Invalid Lines post. I have multiple data as follows.

Date,Open,High,Low,Close,Adj Close,Volume
1960-01-04,4.520000,4.520000,4.520000,4.520000,4.520000,0
1960-01-05,4.550000,4.550000,4.550000,4.550000,4.550000,0
1960-01-06,4.680000,4.680000,4.680000,4.680000,4.680000,0
1960-01-07,4.630000,4.630000,4.630000,4.630000,4.630000,0
1960-01-08,4.590000,4.590000,4.590000,4.590000,4.590000,0
1960-01-10,null,null,null,null,null,null
1960-01-11,4.540000,4.540000,4.540000,4.540000,4.540000,0
1960-01-12,4.540000,4.540000,4.540000,4.540000,4.540000,0
1960-01-13,4.560000,4.560000,4.560000,4.560000,4.560000,0
1960-01-14,4.510000,4.510000,4.510000,4.510000,4.510000,0
1960-01-15,4.490000,4.490000,4.490000,4.490000,4.490000,0
1960-01-17,null,null,null,null,null,null
1960-01-18,4.370000,4.370000,4.370000,4.370000,4.370000,0
1960-01-19,4.310000,4.310000,4.310000,4.310000,4.310000,0
Date,Open,High,Low,Close,Adj Close,Volume
1927-12-30,17.660000,17.660000,17.660000,17.660000,17.660000,0
1928-01-03,17.760000,17.760000,17.760000,17.760000,17.760000,0
1928-01-04,17.719999,17.719999,17.719999,17.719999,17.719999,0
1928-01-05,17.549999,17.549999,17.549999,17.549999,17.549999,0
1928-01-06,17.660000,17.660000,17.660000,17.660000,17.660000,0
1928-01-09,17.500000,17.500000,17.500000,17.500000,17.500000,0
1928-01-10,17.370001,17.370001,17.370001,17.370001,17.370001,0
1928-01-11,17.350000,17.350000,17.350000,17.350000,17.350000,0
1928-01-12,17.469999,17.469999,17.469999,17.469999,17.469999,0
1928-01-13,17.580000,17.580000,17.580000,17.580000,17.580000,0
1928-01-16,17.290001,17.290001,17.290001,17.290001,17.290001,0
1928-01-17,17.299999,17.299999,17.299999,17.299999,17.299999,0
1928-01-18,17.260000,17.260000,17.260000,17.260000,17.260000,0
1928-01-19,17.379999,17.379999,17.379999,17.379999,17.379999,0
1928-01-20,17.480000,17.480000,17.480000,17.480000,17.480000,0
1928-01-23,17.639999,17.639999,17.639999,17.639999,17.639999,0
1928-01-24,17.709999,17.709999,17.709999,17.709999,17.709999,0
1928-01-25,17.520000,17.520000,17.520000,17.520000,17.520000,0
1928-01-26,17.629999,17.629999,17.629999,17.629999,17.629999,0
1928-01-27,17.690001,17.690001,17.690001,17.690001,17.690001,0
1928-01-30,17.490000,17.490000,17.490000,17.490000,17.490000,0
1928-01-31,17.570000,17.570000,17.570000,17.570000,17.570000,0
1928-02-01,17.530001,17.530001,17.530001,17.530001,17.530001,0

I want to not read and skip the ugly nulls in the first data, and this can be done by only importing non-null lines from _infile_ as follows.

data irx;
	infile 'https://query1.finance.yahoo.com/v7/finance/download/^irx
?period1=-9999999999&period2=9999999999' url firstobs=2 dsd truncover;
	input @;
	if index(_infile_,"null")=0;
	input date yymmdd10. +1 open hi lo close adj vol;
run;

But I'm now importing multiple files together using filevar option as follows.

data all;
	input id $ @@;
	url=compress("https://query1.finance.yahoo.com/v7/finance/download/"||id||
		"?period1="||dhms("1jan1901"d,0,0,0)-315619200||
		'&period2='||dhms("31dec2100"d,23,59,59)-315619200);
	infile dummy url filevar=url firstobs=2 dsd truncover end=last;
	do until(last);
		input @;
		if index(_infile_,"null")=0;
		input date yymmdd10. +1 open hi lo close adj vol;
		output;
	end;
cards;
^irx ^gspc
;

and this code reads only the first five lines from the first data—before encountering the null at the sixth line. It seems input @; and if index(_infile_,"null")=0; somehow affects variable last, but I'm not sure. How should I detour this problem?

1 ACCEPTED SOLUTION

Accepted Solutions
CurtisMackWSIPP
Lapis Lazuli | Level 10

I was right about the input not being closed.  This works as well.

 


data all;
	input id $ @@;
	url=compress("https://query1.finance.yahoo.com/v7/finance/download/"||id||
		"?period1="||dhms("1jan1901"d,0,0,0)-315619200||
		'&period2='||dhms("31dec2100"d,23,59,59)-315619200);
	infile dummy url filevar=url firstobs=2 dsd truncover end=last;
	do until(last);
		input @;
		if index(_infile_,"null")=0 then do;
		  input date yymmdd10. +1 open hi lo close adj vol;
		  output;
    end;
    else input;
	end;
cards;
^irx ^gspc
;

View solution in original post

9 REPLIES 9
CurtisMackWSIPP
Lapis Lazuli | Level 10

I suspect the open input statement is triggering that last condition.

The custom INFORMAT solution I mentioned last time works great. 

proc format;
  invalue myfmt "null" = .
                default = 10.5;
run;

data all;
	input id $ @@;
	url=compress("https://query1.finance.yahoo.com/v7/finance/download/"||id||
		"?period1="||dhms("1jan1901"d,0,0,0)-315619200||
		'&period2='||dhms("31dec2100"d,23,59,59)-315619200);
  informat open hi lo close adj vol myfmt.;
	infile dummy url filevar=url firstobs=2 dsd truncover end=last;
	do until(last);
		input date yymmdd10. +1 open hi lo close adj vol;
		if adj>. then output;
	end;
cards;
^irx ^gspc
;
ballardw
Super User

Are you sure you don't want something more like:

data irx;
	infile 'https://query1.finance.yahoo.com/v7/finance/download/^irx
?period1=-9999999999&period2=9999999999' url firstobs=2 dsd truncover;
	input @;
	if index(_infile_,"null")=0 then do;
	    input date yymmdd10. +1 open hi lo close adj vol;
            output;
        end;
       else input;
run;

One of things about using Input @; and holding the input pointer on a  line is that at some point you need to release the pointer, or at least should.

 

Not a fan of obscure magic numbers as appear here:

dhms("1jan1901"d,0,0,0)-315619200

I would suggest using something with the INTNX function instead of -315619200 so someone reading the code later has better chance of understanding what that offset might actually mean.

CurtisMackWSIPP
Lapis Lazuli | Level 10

Funny, we posted the same exact code at the same time.

ballardw
Super User

@CurtisMackWSIPP wrote:

Funny, we posted the same exact code at the same time.


Great minds and all that. 😀

Junyong
Pyrite | Level 9

1. I put the INPUT statement to use _INFILE_ in that way—I have been using SAS quite a long time, but DATA steps rather than other PROCs are always most challenging. Thanks.

2. The number was based on Calculating the UTC offset in your SAS session, which shows 315,619,200=3653*24*60*60. I wanted future readers to easily adjust the span based on just the simple dates, but it seems I need to think of more readable alternatives.

ballardw
Super User

@Junyong wrote:

1. I put the INPUT statement to use _INFILE_ in that way—I have been using SAS quite a long time, but DATA steps rather than other PROCs are always most challenging. Thanks.

2. The number was based on Calculating the UTC offset in your SAS session, which shows 315,619,200=3653*24*60*60. I wanted future readers to easily adjust the span based on just the simple dates, but it seems I need to think of more readable alternatives.


So is that supposed to be a 10 year offset?

Please tell me exactly what the resulting datetime values are supposed to be for that result.

 

Your offset for the first data yields 31Dec1890:00:00:00 and the second 30Dec2090:00:00:00, so misses the last day of 2090.

 

I will guess that you are just setting a range that is supposed to get "everything" and have provided boundaries actually outside the range of values available so for this you don't care that your are missing the 31DEC2090. But that general approach will get you in trouble with date/clock times for other purposes. Not to mention that when the next guy has to use your code an change it there isn't any documentation as to what that offset represents or why.

CurtisMackWSIPP
Lapis Lazuli | Level 10

I was right about the input not being closed.  This works as well.

 


data all;
	input id $ @@;
	url=compress("https://query1.finance.yahoo.com/v7/finance/download/"||id||
		"?period1="||dhms("1jan1901"d,0,0,0)-315619200||
		'&period2='||dhms("31dec2100"d,23,59,59)-315619200);
	infile dummy url filevar=url firstobs=2 dsd truncover end=last;
	do until(last);
		input @;
		if index(_infile_,"null")=0 then do;
		  input date yymmdd10. +1 open hi lo close adj vol;
		  output;
    end;
    else input;
	end;
cards;
^irx ^gspc
;
Junyong
Pyrite | Level 9

Yes, the bottleneck was the unclosed INPUT. I think this IF-ELSE approach will be more stable than only the IF subsetting. Thanks again for your teaching.

CurtisMackWSIPP
Lapis Lazuli | Level 10

Glad it worked and thanks for the accepted solution.  But I think the INFORMAT solution is more stable.  As I said before.  The accepted approach will delete the row if any value is "null", not just one value as your original request was trying to do.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 840 views
  • 0 likes
  • 3 in conversation