SAS Programming

DATA Step, Macro, Functions and more
BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Junyong
Pyrite | Level 9

The following code downloads a time-series CSV from FRED and reads it.

filename PCEND "%sysfunc(getoption(work))\PCEND.csv";
proc http method="get" out=PCEND url="https://fred.stlouisfed.org/graph/fredgraph.csv?id=PCEND";
run;
data PCEND;
infile PCEND;
input TIME yymmdd10. +1 PCEND;
TIME=put(TIME,yymmddn8.)+0;
run;

The downloaded CSV has the following contents inside.

DATE,PCEND
1959-01-01,125.1
1959-02-01,125.4
1959-03-01,127.7
1959-04-01,125.6
1959-05-01,128.0
1959-06-01,127.9
1959-07-01,127.3
1959-08-01,127.9
1959-09-01,129.4
1959-10-01,129.8
1959-11-01,129.0
1959-12-01,129.7

Unfortunately, the resulting SAS data set displays missing values in the first row and starts from February in the second row. I tried this code to read another variable (PCES) and there was no problem. What did I wrong here? Though FIRSTOBS=2 corrects this issue, I want to avoid FIRSTOBS as some files start from the third row. Thanks for your help.

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

Changing the variable name makes the first line shorter.

You have not included the TRUNCOVER (or the older less useful MISSOVER) option on the INFILE statement, so when you read past the end of a line SAS will move to the next line to find a value for the variable.  SAS will tell you when it does this in the notes at the end of the data step.  You can also compare the number of lines read with the number of observations created.

View solution in original post

15 REPLIES 15
PaigeMiller
Diamond | Level 26

If you don't use FIRSTOBS=2, then SAS tries to read the first row which contains the string DATE,PCEND and it can't interpret this with informat yymmdd10.

 

How will you know which files have data starting in row 3 and which files have data starting in row 2? You could read the entire line as a text string, and if it finds an actual date in columns 1-10, convert that to a numeric SAS date using informat yymmdd10. But it would be a superior solution, in my opinion, to standardize the format of the files so they all have data beginning in row 2.

--
Paige Miller
Junyong
Pyrite | Level 9

I understand that SAS cannot read the strings. However, shouldn't this be the same for other data too? The following code just alters PCEND by PCES.

filename PCES "%sysfunc(getoption(work))\PCES.csv";
proc http method="get" out=PCES url="https://fred.stlouisfed.org/graph/fredgraph.csv?id=PCES";
run;
data PCES;
infile PCES;
input TIME yymmdd10. +1 PCES;
TIME=put(TIME,yymmddn8.)+0;
run;

This CSV file has the same structure.

DATE,PCES
1959-01-01,138.7
1959-02-01,140.0
1959-03-01,140.5
1959-04-01,141.5
1959-05-01,142.8
1959-06-01,144.3
1959-07-01,145.1
1959-08-01,146.1
1959-09-01,147.4
1959-10-01,148.3
1959-11-01,149.2
1959-12-01,150.4

So if the previous problem is the strings in the first line, then the first line here shouldn't be recognized. Unlike the previous case, however, SAS correctly reads all the file from January even without the FIRSTOBS adjustment. What would be the difference here then?

Tom
Super User Tom
Super User

Changing the variable name makes the first line shorter.

You have not included the TRUNCOVER (or the older less useful MISSOVER) option on the INFILE statement, so when you read past the end of a line SAS will move to the next line to find a value for the variable.  SAS will tell you when it does this in the notes at the end of the data step.  You can also compare the number of lines read with the number of observations created.

novinosrin
Tourmaline | Level 20

Hi @Junyong  A simple question to consider.

 

Did it skip or the informat yymmdd10 used to read non-standard data and convert into number failed with log notes invalid data? This is obvious as the informat while appropriate for reading non-standard char dates would not be appropriate for standard char data that's in record 1. 

 

The same rules apply when you are trying to read a standard char value as a numeric variable PCEND.

 

So in a nutshell the first record and the remaining records are different types and are not compatible for the instructions supplied in the INPUT statement.

 

I hope this helps 

Junyong
Pyrite | Level 9

Thanks, but I attached some additional findings right above—when I just change the variable from PCEND to PCES, SAS just reads all the data in a correct way. Both data have an identical structure starting from the second. Then why are the results different here?

SASKiwi
PROC Star

Also this statement doesn't make a lot of sense:

TIME=put(TIME,yymmddn8.)+0;

It looks like you are trying to add 0 to a date character string then assign it to time variable. I bet you are getting notes in your SAS log about conversion from character to numeric. What do you actually want to do here?

Junyong
Pyrite | Level 9

No note—this line changes the numeric value -365 to 19590101.

SASKiwi
PROC Star

Why do you not want to use SAS dates? You are converting a SAS date (-365 is a SAS date without a SAS date format applied to it) to just a plain number that represents a date.

Junyong
Pyrite | Level 9

I know I can use

format TIME yymmddn8.;

but I am using this unusual way due to a different reason (senior coworker) for this time only.

Tom
Super User Tom
Super User

If you want to convert a date value into a number in YY,YYM,MDD format you should be clearer in your code to avoid confusion.  Instead of adding zero to a character string, use the INPUT() function to convert the string into a number.

time=input(put(time.yymmddn8.),8.);
Tom
Super User Tom
Super User

There is nothing wrong with your INFILE statement. You just need to make your use of INPUT smarter.

Sounds like you just want to ignore lines that don't have a value date in the first field.

data PCEND;
  infile PCEND dsd truncover ;
  input TIME : ?? yymmdd10. @ ;
  if missing(time) then delete ;
  input PCEND ;
  format TIME yymmdd10.;
run;

PS Why would you name a variable that contains date values time?  That is just going to confuse somebody.

Junyong
Pyrite | Level 9

If the INFILE is the troublemaker, then shouldn't it be the same for other cases? But this code produces no problem when it just reads PCES rather than PECND as follows.

filename PCES "%sysfunc(getoption(work))\PCES.csv";
proc http method="get" out=PCES url="https://fred.stlouisfed.org/graph/fredgraph.csv?id=PCES";
run;
data PCES;
infile PCES;
input TIME yymmdd10. +1 PCES;
TIME=put(TIME,yymmddn8.)+0;
run;

Both PCES and PCEND are structurally identical as follows.

DATE,PCES
1959-01-01,138.7
1959-02-01,140.0
1959-03-01,140.5
1959-04-01,141.5
1959-05-01,142.8
1959-06-01,144.3
1959-07-01,145.1
1959-08-01,146.1
1959-09-01,147.4
1959-10-01,148.3
1959-11-01,149.2
1959-12-01,150.4

The left and right are PCEND and PCES, respectively. SAS doesn't read the January in PCEND, but does read in PCES.

1.png2.png

Why are the results different?

By the way, some FRED data are uneven—UMCSENT, for example, is less frequent than monthly before 1978, but becomes monthly in 1978. Also, date information in FRED such as 1950-01-01 actually means 1950-01-31 sometimes. So I rather used TIME here.

mkeintz
PROC Star

@Junyong wrote:

If the INFILE is the troublemaker, then shouldn't it be the same for other cases? But this code produces no problem when it just reads PCES rather than PECND as follows.

filename PCES "%sysfunc(getoption(work))\PCES.csv";
proc http method="get" out=PCES url="https://fred.stlouisfed.org/graph/fredgraph.csv?id=PCES";
run;
data PCES;
infile PCES;
input TIME yymmdd10. +1 PCES;
TIME=put(TIME,yymmddn8.)+0;
run;

By the way, some FRED data are uneven—UMCSENT, for example, is less frequent than monthly before 1978, but becomes monthly in 1978. Also, date information in FRED such as 1950-01-01 actually means 1950-01-31 sometimes. So I rather used TIME here.


 

I too strongly suggest you don't use the variable TIME to contain a data value, even when that date sometimes uses the  1st of a month, or the end of a month to identify  a monthly report.  I suggest you use DATE, but add the following statement

 

DATE = intnx('month',DATE,0,'END');

which will adjust any date to the last day of the month.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Junyong
Pyrite | Level 9

In other cases, I use

format TIME yymmddn8.;

to format the observations or

TIME=intnx("month",TIME,0,"e");

to transform 1900-01-01 to 1900-01-31. I am using this unusual way due to a different reason only for this time. Thanks.

sas-innovate-white.png

Our biggest data and AI event of the year.

Don’t miss the livestream kicking off May 7. It’s free. It’s easy. And it’s the best seat in the house.

Join us virtually with our complimentary SAS Innovate Digital Pass. Watch live or on-demand in multiple languages, with translations available to help you get the most out of every session.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 15 replies
  • 3024 views
  • 0 likes
  • 7 in conversation