I have a data set, the first 10 rows of which look like the following. year,R_F,R_MKT,R_ME,R_IA,R_ROE,R_EG
1967,4.1474,24.4192,40.5479,-11.4478,20.6095,-3.2998
1968,5.2942,8.8747,24.9021,14.7436,-2.4844,11.4650
1969,6.5912,-17.4274,-11.7458,0.4645,15.4144,12.9056
1970,6.3829,-6.3099,-7.9029,22.7755,-0.4696,17.1111
1971,4.3172,11.8817,5.3575,0.9003,11.4332,6.1606
1972,3.8912,13.4494,-9.0317,5.1487,5.6877,15.0977
1973,7.0586,-25.8075,-17.1956,7.7738,0.9487,17.2879
1974,8.0781,-36.0193,4.5604,18.5330,11.4993,20.1174
1975,5.8210,31.5368,16.9765,7.4978,-6.2017,11.5153 So, each value has at most four numbers after the decimal point. There is no explicit bound, but the values will effectively be in between -999 and 999. I ran the following code. data want;
infile "http://global-q.org/uploads/1/2/2/6/122679606/
q5_factors_annual_2019a.csv" url firstobs=2 dsd;
length year 3 R_F R_MKT R_ME R_IA R_ROE R_EG 6;
input year R_F R_MKT R_ME R_IA R_ROE R_EG;
run; I used 6 for the variable length based on this document, but it seems some values are unusually read as follows. I wonder whether (1) the unusual values such as the 5.2941999999 above are just OK, and (2) the default length 8 rather than the 6 above must be used for these four-digit values.
... View more