- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
The following code downloads a time-series CSV from FRED and reads it.
filename PCEND "%sysfunc(getoption(work))\PCEND.csv";
proc http method="get" out=PCEND url="https://fred.stlouisfed.org/graph/fredgraph.csv?id=PCEND";
run;
data PCEND;
infile PCEND;
input TIME yymmdd10. +1 PCEND;
TIME=put(TIME,yymmddn8.)+0;
run;
The downloaded CSV has the following contents inside.
DATE,PCEND 1959-01-01,125.1 1959-02-01,125.4 1959-03-01,127.7 1959-04-01,125.6 1959-05-01,128.0 1959-06-01,127.9 1959-07-01,127.3 1959-08-01,127.9 1959-09-01,129.4 1959-10-01,129.8 1959-11-01,129.0 1959-12-01,129.7
Unfortunately, the resulting SAS data set displays missing values in the first row and starts from February in the second row. I tried this code to read another variable (PCES) and there was no problem. What did I wrong here? Though FIRSTOBS=2 corrects this issue, I want to avoid FIRSTOBS as some files start from the third row. Thanks for your help.
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Changing the variable name makes the first line shorter.
You have not included the TRUNCOVER (or the older less useful MISSOVER) option on the INFILE statement, so when you read past the end of a line SAS will move to the next line to find a value for the variable. SAS will tell you when it does this in the notes at the end of the data step. You can also compare the number of lines read with the number of observations created.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
If you don't use FIRSTOBS=2, then SAS tries to read the first row which contains the string DATE,PCEND and it can't interpret this with informat yymmdd10.
How will you know which files have data starting in row 3 and which files have data starting in row 2? You could read the entire line as a text string, and if it finds an actual date in columns 1-10, convert that to a numeric SAS date using informat yymmdd10. But it would be a superior solution, in my opinion, to standardize the format of the files so they all have data beginning in row 2.
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I understand that SAS cannot read the strings. However, shouldn't this be the same for other data too? The following code just alters PCEND by PCES.
filename PCES "%sysfunc(getoption(work))\PCES.csv";
proc http method="get" out=PCES url="https://fred.stlouisfed.org/graph/fredgraph.csv?id=PCES";
run;
data PCES;
infile PCES;
input TIME yymmdd10. +1 PCES;
TIME=put(TIME,yymmddn8.)+0;
run;
This CSV file has the same structure.
DATE,PCES 1959-01-01,138.7 1959-02-01,140.0 1959-03-01,140.5 1959-04-01,141.5 1959-05-01,142.8 1959-06-01,144.3 1959-07-01,145.1 1959-08-01,146.1 1959-09-01,147.4 1959-10-01,148.3 1959-11-01,149.2 1959-12-01,150.4
So if the previous problem is the strings in the first line, then the first line here shouldn't be recognized. Unlike the previous case, however, SAS correctly reads all the file from January even without the FIRSTOBS adjustment. What would be the difference here then?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Changing the variable name makes the first line shorter.
You have not included the TRUNCOVER (or the older less useful MISSOVER) option on the INFILE statement, so when you read past the end of a line SAS will move to the next line to find a value for the variable. SAS will tell you when it does this in the notes at the end of the data step. You can also compare the number of lines read with the number of observations created.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi @Junyong A simple question to consider.
Did it skip or the informat yymmdd10 used to read non-standard data and convert into number failed with log notes invalid data? This is obvious as the informat while appropriate for reading non-standard char dates would not be appropriate for standard char data that's in record 1.
The same rules apply when you are trying to read a standard char value as a numeric variable PCEND.
So in a nutshell the first record and the remaining records are different types and are not compatible for the instructions supplied in the INPUT statement.
I hope this helps
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks, but I attached some additional findings right above—when I just change the variable from PCEND to PCES, SAS just reads all the data in a correct way. Both data have an identical structure starting from the second. Then why are the results different here?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Also this statement doesn't make a lot of sense:
TIME=put(TIME,yymmddn8.)+0;
It looks like you are trying to add 0 to a date character string then assign it to time variable. I bet you are getting notes in your SAS log about conversion from character to numeric. What do you actually want to do here?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
No note—this line changes the numeric value -365 to 19590101.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Why do you not want to use SAS dates? You are converting a SAS date (-365 is a SAS date without a SAS date format applied to it) to just a plain number that represents a date.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I know I can use
format TIME yymmddn8.;
but I am using this unusual way due to a different reason (senior coworker) for this time only.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
If you want to convert a date value into a number in YY,YYM,MDD format you should be clearer in your code to avoid confusion. Instead of adding zero to a character string, use the INPUT() function to convert the string into a number.
time=input(put(time.yymmddn8.),8.);
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
There is nothing wrong with your INFILE statement. You just need to make your use of INPUT smarter.
Sounds like you just want to ignore lines that don't have a value date in the first field.
data PCEND;
infile PCEND dsd truncover ;
input TIME : ?? yymmdd10. @ ;
if missing(time) then delete ;
input PCEND ;
format TIME yymmdd10.;
run;
PS Why would you name a variable that contains date values time? That is just going to confuse somebody.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
If the INFILE is the troublemaker, then shouldn't it be the same for other cases? But this code produces no problem when it just reads PCES rather than PECND as follows.
filename PCES "%sysfunc(getoption(work))\PCES.csv";
proc http method="get" out=PCES url="https://fred.stlouisfed.org/graph/fredgraph.csv?id=PCES";
run;
data PCES;
infile PCES;
input TIME yymmdd10. +1 PCES;
TIME=put(TIME,yymmddn8.)+0;
run;
Both PCES and PCEND are structurally identical as follows.
DATE,PCES 1959-01-01,138.7 1959-02-01,140.0 1959-03-01,140.5 1959-04-01,141.5 1959-05-01,142.8 1959-06-01,144.3 1959-07-01,145.1 1959-08-01,146.1 1959-09-01,147.4 1959-10-01,148.3 1959-11-01,149.2 1959-12-01,150.4
The left and right are PCEND and PCES, respectively. SAS doesn't read the January in PCEND, but does read in PCES.
Why are the results different?
By the way, some FRED data are uneven—UMCSENT, for example, is less frequent than monthly before 1978, but becomes monthly in 1978. Also, date information in FRED such as 1950-01-01 actually means 1950-01-31 sometimes. So I rather used TIME here.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@Junyong wrote:
If the INFILE is the troublemaker, then shouldn't it be the same for other cases? But this code produces no problem when it just reads PCES rather than PECND as follows.
filename PCES "%sysfunc(getoption(work))\PCES.csv"; proc http method="get" out=PCES url="https://fred.stlouisfed.org/graph/fredgraph.csv?id=PCES"; run; data PCES; infile PCES; input TIME yymmdd10. +1 PCES; TIME=put(TIME,yymmddn8.)+0; run;
By the way, some FRED data are uneven—UMCSENT, for example, is less frequent than monthly before 1978, but becomes monthly in 1978. Also, date information in FRED such as 1950-01-01 actually means 1950-01-31 sometimes. So I rather used TIME here.
I too strongly suggest you don't use the variable TIME to contain a data value, even when that date sometimes uses the 1st of a month, or the end of a month to identify a monthly report. I suggest you use DATE, but add the following statement
DATE = intnx('month',DATE,0,'END');
which will adjust any date to the last day of the month.
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set
Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets
--------------------------
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
In other cases, I use
format TIME yymmddn8.;
to format the observations or
TIME=intnx("month",TIME,0,"e");
to transform 1900-01-01 to 1900-01-31. I am using this unusual way due to a different reason only for this time. Thanks.