I think this problem vests with the different SAS platforms in use. When I copy and paste your code above and my variables adopt some pretty "funky" values (see attachment).
Any readers of this thread if you could please assist with the original question posted, I am looking to create a code to obtain the variable below, with using integer dates, not sas dates.
(June T0) CF= (EARN+DEP)/ME (INPUTS ON THE RIGHT HAND SIDE ARE FOR DEC T-1)
Just use the code I've provided together with a .csv and it will work!
Please read my comments about SAS dates and forget about this integer stuff. It IS a bad idea and the wrong approach. It's simply not how one needs to deal with dates (and if people would have done it this way then there would never have been a Y2K issue - that's why the Mac OS didn't have to care ever about Y2K).
I just read this sorry. OK so SAS dates are the way to go, I still can't narrow down the problem in my tester data.
What I don't understand is why you should get different results running "my" code with the identical "tester.csv" data. I've even run this code using PC SAS 9.2 and I'm still getting the results I've posted. So you MUST do something different.
What I can see in the SAS table snapshot you've posted is that it looks as if there aren't the right data in the columns. This could happen if you wouldn't use 'truncover' or 'dsd' in the infile statement or if your source data would contain commas and text data is not quoted. But as I believe we're using the same code and the exactly same "texter.csv" this can't really be the explanation. Are we using identical code and source data?
By the way: What OS are you using? Mine is Win7. In case you're using some UNIX/Linux environment then try running "my code" without the following infile option "TERMSTR=CRLF".
My TESTER data was saved to desktop so I altered "%let csvFile=C:\temp\tester.csv;" to "%let csvFile=desktop\tester.csv" and couldn't get your results. So I created a temp folder in C drive and saved the TESTER data there, to prevent me from editing anything and ET VOILA it started working! So strange, I'm perplexed why location of the db would of caused conflicting results
Another example of the mysterious ways of IT! Happy that we've got now identical results so I assume you're now alright to run this code against your full data.
Just when we thought the misery was over. The code doesn't work on my full data set.
My original data holds about 1.3mill observations, so I originally had this saved as a DBF file. To make the data compatible with the code, I reopened the large DBF file and saved it as a CSV in temp, C drive and renamed it TESTER.
When I ran the code, the date values became blank, and the desired want database had 0 observations. See snap shot.
I have attached a small sample of my real data in its DBF origin. Hopefully it is a simple tweak to get this working.
Under Windows you can get the correct path to an object stored on desktop by right clicking on the object, then choose "properties" and in the pop-up window copy the path under "Location". That's the path you need to use.
Alternatively you can query default Windows environment variables. For example to get the correct path to "tester.csv" stored on your desktop it could be:
%let csvFile=%sysget(homedrive)%sysget(homepath)\desktop\tester.csv;
Hm interesting, thanks for that. If you refer to the original question, the sample of the DBF "real data" is saved there.
Oh gosh! I had a look at your .dbf file and it appears that the date is stored as a DBF datetime value (so also some number but may be not counted as days from 1January1960). Do you have SAS/Access to PC Files licensed? This would give you the interface to read DBF files directly (I'm having access to this module but it doesn't make sense to give you some code which you can't run). http://support.sas.com/documentation/cdl/en/acpcref/63184/HTML/default/viewer.htm#a003087278.htm
Alternatively: We would need to find out how a .dbf value gets stored when accessing such a file via Excel and then saving it as .csv.
I'm sure that's somewhere documented on Internet. Once we know what the value represents (eg. seconds since date xxx) it shouldn't be too hard to convert this value into a SAS date value - and then the SQL as provided would work.
Sorry - it's getting too late for me now and I have to work tomorrow. I'll leave you with this research. Please post what you find out. I'll have a look into this post tomorrow when I've got time (I'm living in Sydney and it's now 11PM on Sunday here).
Cheers
Patrick
O.K. - the last one for tonight. The issue was that the date string got stored with some decimals Something the informat couldn't handle. Not sure why this didn't already happen in your tester data.
So what I've done now is:
1. download your dbf
2. open dbf with Excel and store it as .csv
3. generate SAS code using EG import wizard
4. apply some modifications to get a SAS date value stored in variable "date"
Here the code inclusive the SQL. Everything now running from dataset "real" and no more "tester".
Man! That was now hard considering that the SQL answering your original question didn't change.
%let csvFile=C:\temp\real.csv;
DATA WORK.real;
LENGTH
dateff 8
mktrf 8
smb 8
hml 8
rf 8
umd 8
MONTHDATE 8
DATE 8
HSICMG 8
PERMNO 8
SHROUT 8
PRC 8
RET $ 9
RETX $ 9
vwretd 8
vwretx 8
ewretd 8
ewretx 8
year 8
fyear 8
gvkey 8
lpermno 8
datadate 8
indfmt $ 4
consol $ 1
popsrc $ 1
datafmt $ 3
CURCD $ 3
ACT 8
AT 8
BKVLPS 8
CH 8
DEPC $ 1
DLC 8
DLTT 8
DVT 8
EBIT 8
EBITDA 8
EPSPI 8
REVT 8
TXT 8
XDP 8
XINT 8
COSTAT $ 1
earn 8
dep 8
me 8
MONTH 8 ;
FORMAT
dateff BEST13.
mktrf BEST7.
smb BEST7.
hml BEST7.
rf BEST6.
umd BEST7.
MONTHDATE BEST11.
DATE BEST13.
HSICMG BEST7.
PERMNO BEST10.
SHROUT BEST11.
PRC BEST8.
RET $CHAR9.
RETX $CHAR9.
vwretd BEST7.
vwretx BEST7.
ewretd BEST7.
ewretx BEST7.
year BEST9.
fyear BEST9.
gvkey BEST9.
lpermno BEST10.
datadate BEST13.
indfmt $CHAR4.
consol $CHAR1.
popsrc $CHAR1.
datafmt $CHAR3.
CURCD $CHAR3.
ACT BEST9.
AT BEST10.
BKVLPS BEST7.
CH BEST8.
DEPC $CHAR1.
DLC BEST9.
DLTT BEST10.
DVT BEST9.
EBIT BEST9.
EBITDA BEST9.
EPSPI BEST8.
REVT BEST10.
TXT BEST9.
XDP BEST1.
XINT BEST9.
COSTAT $CHAR1.
earn BEST12.
dep BEST9.
me BEST13.
MONTH BEST7. ;
INFORMAT
dateff BEST13.
mktrf BEST7.
smb BEST7.
hml BEST7.
rf BEST6.
umd BEST7.
MONTHDATE BEST11.
DATE BEST13.
HSICMG BEST7.
PERMNO BEST10.
SHROUT BEST11.
PRC BEST8.
RET $CHAR9.
RETX $CHAR9.
vwretd BEST7.
vwretx BEST7.
ewretd BEST7.
ewretx BEST7.
year BEST9.
fyear BEST9.
gvkey BEST9.
lpermno BEST10.
datadate BEST13.
indfmt $CHAR4.
consol $CHAR1.
popsrc $CHAR1.
datafmt $CHAR3.
CURCD $CHAR3.
ACT BEST9.
AT BEST10.
BKVLPS BEST7.
CH BEST8.
DEPC $CHAR1.
DLC BEST9.
DLTT BEST10.
DVT BEST9.
EBIT BEST9.
EBITDA BEST9.
EPSPI BEST8.
REVT BEST10.
TXT BEST9.
XDP BEST1.
XINT BEST9.
COSTAT $CHAR1.
earn BEST12.
dep BEST9.
me BEST13.
MONTH BEST7. ;
INFILE "&csvFile"
LRECL=1000
DLM=','
truncover
DSD
firstobs=2;
INPUT
dateff : ?? COMMA13.
mktrf : ?? COMMA7.
smb : ?? COMMA7.
hml : ?? COMMA7.
rf : ?? COMMA6.
umd : ?? COMMA7.
MONTHDATE : ?? COMMA11.
DATE : ?? COMMA13.
HSICMG : ?? COMMA7.
PERMNO : ?? COMMA10.
SHROUT : ?? COMMA11.
PRC : ?? COMMA8.
RET : $CHAR9.
RETX : $CHAR9.
vwretd : ?? COMMA7.
vwretx : ?? COMMA7.
ewretd : ?? COMMA7.
ewretx : ?? COMMA7.
year : ?? COMMA9.
fyear : ?? COMMA9.
gvkey : ?? COMMA9.
lpermno : ?? COMMA10.
datadate : ?? COMMA13.
indfmt : $CHAR4.
consol : $CHAR1.
popsrc : $CHAR1.
datafmt : $CHAR3.
CURCD : $CHAR3.
ACT : ?? COMMA9.
AT : ?? COMMA10.
BKVLPS : ?? COMMA7.
CH : ?? COMMA8.
DEPC : $CHAR1.
DLC : ?? COMMA9.
DLTT : ?? COMMA10.
DVT : ?? COMMA9.
EBIT : ?? COMMA9.
EBITDA : ?? COMMA9.
EPSPI : ?? COMMA8.
REVT : ?? COMMA10.
TXT : ?? COMMA9.
XDP : ?? BEST1.
XINT : ?? COMMA9.
COSTAT : $CHAR1.
earn : ?? COMMA12.
dep : ?? COMMA9.
me : ?? COMMA13.
MONTH : ?? COMMA7. ;
/* convert to SAS date value */
format date yymmdd10.;
date=input(put(int(date),8.),yymmdd8.);
RUN;
proc sql;
create table want as
select
l.permno
,l.date
,r.EARN
,r.DEP
,r.ME
,(r.EARN+r.DEP)/r.ME as CF1
,sum(r.EARN,r.DEP)/r.ME as CF2
from real l,real r
where l.permno=r.permno and intnx('month',l.date,0,'e')=intnx('month',r.date,6,'e') and month(l.date)=6
order by l.permno, l.date
;
quit;
Patrick you are something else! :smileycool: Code works! Thanks for the readings, fascinating stuff! From this day forth I am going to approach the fickle creature SAS with a great deal of caution. Sorry for keeping you up late!
hi ... no comments about the SQL part that answers your original question (pretty neat), but though that EG-generated SAS code is convenient, I think that it's a bit over the top
(I guess that the code-generation has to be generic enough to handle anything)
I think that it's odd that EG provides a LENGTH statement that declares that all the numeric variables have length 8 when that's the default without a LENGTH statement and that it provides lengths for the character variables when they would have those lengths anyways via the INFORMATS used in the INPUT statement
EG decided to make RET and RETX character variables even though only 10 of 1600+ rows have the character 'C' as the values for those variables ... all the rest are numeric
not sure if I like all those "??" in the INPUT statement since they'll suppress error messages when you read numeric data ... that's not really the safest thing to do, yes/no?
all those COMMA informats will suppress error messages when the INPUT statement encounters problematic numbers (e.g. numbers with embedded spaces)
last, if you run PROC CONTENTS you would see the INFORMATS from the INFORMAT statement written ut by EG next to each variable
however, that's deceiving in that they are not the actual informats used to read the data for all the numeric vacriables
anyways, if you open the DBF file and save as a CSV file, then reopen the CVS file in EXCEL and just save it again, the decimal places (those .0000 after the dates) will disappear
a quick look at the data in Excel (using REAL.CSV after it was re-opened as saved in Execl to drop the decimal places in the dates) and I think you can get by with this (you could skip the INFORMAT and add use ": ??" instead of ": RET." to supress the 10 errors you'd get reading RET and RETX) ...
* what to do with the "C" when encountered as values for RET and RETX ... store as special missing value .C;
proc format;
invalue ret 'C' = .C ;
run;
data real;
infile "z:\real.csv" lrecl=1000 truncover dsd firstobs=2;
input
dateff : yymmdd.
mktrf smb hml rf umd
monthdate : yymmn.
date : yymmdd.
hsicmg permno shrout prc
ret : ret.
retx : ret.
vwretd vwretx ewretd ewretx year fyear gvkey lpermno
datadate : yymmdd.
indfmt : $4.
consol : $1.
popsrc : $1.
datafmt : $3.
curcd : $3.
act at bkvlps ch depc dlc dltt dvt ebit ebitda epspi revt txt xdp xint
costat : $1.
earn dep me month
;
format dateff date monthdate datadate yymmddd10.;
run;
Message was edited by: Mike Zdeb ... added comment about INFORMATS
Use the MISSING statement in the data step to tell SAS to which single charcter values should be read as special missing.
MISSING C;
hi ... neat
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.