BookmarkSubscribeRSS Feed
Harrymac
Calcite | Level 5

I think this problem vests with the different SAS platforms in use. When I copy and paste your code above and my variables adopt some pretty "funky" values (see attachment). problematic.JPG

Any readers of this thread if you could please assist with the original question posted, I am looking to create a code to obtain the variable below, with using integer dates, not sas dates.

(June T0) CF= (EARN+DEP)/ME (INPUTS ON THE RIGHT HAND SIDE ARE FOR DEC T-1)

Patrick
Opal | Level 21

Just use the code I've provided together with a .csv and it will work!

Please read my comments about SAS dates and forget about this integer stuff. It IS a bad idea and the wrong approach. It's simply not how one needs to deal with dates (and if people would have done it this way then there would never have been a Y2K issue - that's why the Mac OS didn't have to care ever about Y2K).

Harrymac
Calcite | Level 5

I just read this sorry. OK so SAS dates are the way to go, I still can't narrow down the problem in my tester data.

Patrick
Opal | Level 21

What I don't understand is why you should get different results running "my" code with the identical "tester.csv" data. I've even run this code using PC SAS 9.2 and I'm still getting the results I've posted. So you MUST do something different.

What I can see in the SAS table snapshot you've posted is that it looks as if there aren't the right data in the columns. This could happen if you wouldn't use 'truncover' or 'dsd' in the infile statement or if your source data would contain commas and text data is not quoted. But as I believe we're using the same code and the exactly same "texter.csv" this can't really be the explanation. Are we using identical code and source data?

By the way: What OS are you using? Mine is Win7. In case you're using some UNIX/Linux environment then try running "my code" without the following infile option "TERMSTR=CRLF".

Harrymac
Calcite | Level 5

My TESTER data was saved to desktop so I altered "%let csvFile=C:\temp\tester.csv;" to "%let csvFile=desktop\tester.csv"  and couldn't get your results. So I created a temp folder in C drive and saved the TESTER data there, to prevent me from editing anything and ET VOILA it started working! So strange, I'm perplexed why location of the db would of caused conflicting results

Patrick
Opal | Level 21

Smiley Happy Another example of the mysterious ways of IT! Happy that we've got now identical results so I assume you're now alright to run this code against your full data.

Harrymac
Calcite | Level 5

Just when we thought the misery was over. The code doesn't work on my full data set.

My original data holds about 1.3mill observations, so I originally had this saved as a DBF file. To make the data compatible with the code, I reopened the large DBF file and saved it as a CSV in temp, C drive and renamed it TESTER.

When I ran the code, the date values became blank, and the desired want database had 0 observations. See snap shot.problem_realdata.JPG

I have attached a small sample of my real data in its DBF origin. Hopefully it is a simple tweak to get this working.

Patrick
Opal | Level 21

Under Windows you can get the correct path to an object stored on desktop by right clicking on the object, then choose "properties" and in the pop-up window copy the path under "Location". That's the path you need to use.

Alternatively you can query default Windows environment variables. For example to get the correct path to "tester.csv" stored on your desktop it could be:

%let csvFile=%sysget(homedrive)%sysget(homepath)\desktop\tester.csv;

Harrymac
Calcite | Level 5

Hm interesting, thanks for that. If you refer to the original question, the sample of the DBF "real data" is saved there.

Patrick
Opal | Level 21

Oh gosh! I had a look at your .dbf file and it appears that the date is stored as a DBF datetime value (so also some number but may be not counted as days from 1January1960). Do you have SAS/Access to PC Files licensed? This would give you the interface to read DBF files directly (I'm having access to this module but it doesn't make sense to give you some code which you can't run). http://support.sas.com/documentation/cdl/en/acpcref/63184/HTML/default/viewer.htm#a003087278.htm

Alternatively: We would need to find out how a .dbf value gets stored when accessing such a file via Excel and then saving it as .csv.

I'm sure that's somewhere documented on Internet. Once we know what the value represents (eg. seconds since date xxx) it shouldn't be too hard to convert this value into a SAS date value - and then the SQL as provided would work.

Sorry - it's getting too late for me now and I have to work tomorrow. I'll leave you with this research. Please post what you find out. I'll have a look into this post tomorrow when I've got time (I'm living in Sydney and it's now 11PM on Sunday here).

Cheers

Patrick

Patrick
Opal | Level 21

O.K. - the last one for tonight. The issue was that the date string got stored with some decimals Something the informat couldn't handle. Not sure why this didn't already happen in your tester data.

So what I've done now is:

1. download your dbf

2. open dbf with Excel and store it as .csv

3. generate SAS code using EG import wizard

4. apply some modifications to get a SAS date value stored in variable "date"

Here the code inclusive the SQL. Everything now running from dataset "real" and no more "tester".

Man! That was now hard considering that the SQL answering your original question didn't change.

%let csvFile=C:\temp\real.csv;
DATA WORK.real;
    LENGTH
        dateff             8
        mktrf              8
        smb                8
        hml                8
        rf                 8
        umd                8
        MONTHDATE          8
        DATE               8
        HSICMG             8
        PERMNO             8
        SHROUT             8
        PRC                8
        RET              $ 9
        RETX             $ 9
        vwretd             8
        vwretx             8
        ewretd             8
        ewretx             8
        year               8
        fyear              8
        gvkey              8
        lpermno            8
        datadate           8
        indfmt           $ 4
        consol           $ 1
        popsrc           $ 1
        datafmt          $ 3
        CURCD            $ 3
        ACT                8
        AT                 8
        BKVLPS             8
        CH                 8
        DEPC             $ 1
        DLC                8
        DLTT               8
        DVT                8
        EBIT               8
        EBITDA             8
        EPSPI              8
        REVT               8
        TXT                8
        XDP                8
        XINT               8
        COSTAT           $ 1
        earn               8
        dep                8
        me                 8
        MONTH              8 ;
    FORMAT
        dateff           BEST13.
        mktrf            BEST7.
        smb              BEST7.
        hml              BEST7.
        rf               BEST6.
        umd              BEST7.
        MONTHDATE        BEST11.
        DATE             BEST13.
        HSICMG           BEST7.
        PERMNO           BEST10.
        SHROUT           BEST11.
        PRC              BEST8.
        RET              $CHAR9.
        RETX             $CHAR9.
        vwretd           BEST7.
        vwretx           BEST7.
        ewretd           BEST7.
        ewretx           BEST7.
        year             BEST9.
        fyear            BEST9.
        gvkey            BEST9.
        lpermno          BEST10.
        datadate         BEST13.
        indfmt           $CHAR4.
        consol           $CHAR1.
        popsrc           $CHAR1.
        datafmt          $CHAR3.
        CURCD            $CHAR3.
        ACT              BEST9.
        AT               BEST10.
        BKVLPS           BEST7.
        CH               BEST8.
        DEPC             $CHAR1.
        DLC              BEST9.
        DLTT             BEST10.
        DVT              BEST9.
        EBIT             BEST9.
        EBITDA           BEST9.
        EPSPI            BEST8.
        REVT             BEST10.
        TXT              BEST9.
        XDP              BEST1.
        XINT             BEST9.
        COSTAT           $CHAR1.
        earn             BEST12.
        dep              BEST9.
        me               BEST13.
        MONTH            BEST7. ;
    INFORMAT
        dateff           BEST13.
        mktrf            BEST7.
        smb              BEST7.
        hml              BEST7.
        rf               BEST6.
        umd              BEST7.
        MONTHDATE        BEST11.
        DATE             BEST13.
        HSICMG           BEST7.
        PERMNO           BEST10.
        SHROUT           BEST11.
        PRC              BEST8.
        RET              $CHAR9.
        RETX             $CHAR9.
        vwretd           BEST7.
        vwretx           BEST7.
        ewretd           BEST7.
        ewretx           BEST7.
        year             BEST9.
        fyear            BEST9.
        gvkey            BEST9.
        lpermno          BEST10.
        datadate         BEST13.
        indfmt           $CHAR4.
        consol           $CHAR1.
        popsrc           $CHAR1.
        datafmt          $CHAR3.
        CURCD            $CHAR3.
        ACT              BEST9.
        AT               BEST10.
        BKVLPS           BEST7.
        CH               BEST8.
        DEPC             $CHAR1.
        DLC              BEST9.
        DLTT             BEST10.
        DVT              BEST9.
        EBIT             BEST9.
        EBITDA           BEST9.
        EPSPI            BEST8.
        REVT             BEST10.
        TXT              BEST9.
        XDP              BEST1.
        XINT             BEST9.
        COSTAT           $CHAR1.
        earn             BEST12.
        dep              BEST9.
        me               BEST13.
        MONTH            BEST7. ;
    INFILE "&csvFile"
        LRECL=1000
        DLM=','
        truncover
        DSD
        firstobs=2;
    INPUT
        dateff           : ?? COMMA13.
        mktrf            : ?? COMMA7.
        smb              : ?? COMMA7.
        hml              : ?? COMMA7.
        rf               : ?? COMMA6.
        umd              : ?? COMMA7.
        MONTHDATE        : ?? COMMA11.
        DATE             : ?? COMMA13.
        HSICMG           : ?? COMMA7.
        PERMNO           : ?? COMMA10.
        SHROUT           : ?? COMMA11.
        PRC              : ?? COMMA8.
        RET              : $CHAR9.
        RETX             : $CHAR9.
        vwretd           : ?? COMMA7.
        vwretx           : ?? COMMA7.
        ewretd           : ?? COMMA7.
        ewretx           : ?? COMMA7.
        year             : ?? COMMA9.
        fyear            : ?? COMMA9.
        gvkey            : ?? COMMA9.
        lpermno          : ?? COMMA10.
        datadate         : ?? COMMA13.
        indfmt           : $CHAR4.
        consol           : $CHAR1.
        popsrc           : $CHAR1.
        datafmt          : $CHAR3.
        CURCD            : $CHAR3.
        ACT              : ?? COMMA9.
        AT               : ?? COMMA10.
        BKVLPS           : ?? COMMA7.
        CH               : ?? COMMA8.
        DEPC             : $CHAR1.
        DLC              : ?? COMMA9.
        DLTT             : ?? COMMA10.
        DVT              : ?? COMMA9.
        EBIT             : ?? COMMA9.
        EBITDA           : ?? COMMA9.
        EPSPI            : ?? COMMA8.
        REVT             : ?? COMMA10.
        TXT              : ?? COMMA9.
        XDP              : ?? BEST1.
        XINT             : ?? COMMA9.
        COSTAT           : $CHAR1.
        earn             : ?? COMMA12.
        dep              : ?? COMMA9.
        me               : ?? COMMA13.
        MONTH            : ?? COMMA7. ;
  /* convert to SAS date value */
  format date yymmdd10.;
  date=input(put(int(date),8.),yymmdd8.);
RUN;


proc sql;
create table want as
select
l.permno
,l.date
,r.EARN
,r.DEP
,r.ME
,(r.EARN+r.DEP)/r.ME as CF1
,sum(r.EARN,r.DEP)/r.ME as CF2
from real l,real r
where l.permno=r.permno and intnx('month',l.date,0,'e')=intnx('month',r.date,6,'e') and month(l.date)=6
order by l.permno, l.date
;
quit;

Harrymac
Calcite | Level 5

Patrick you are something else! :smileycool: Code works!  Thanks for the readings, fascinating stuff!  From this day forth I am going to approach the fickle creature SAS with a great deal of caution. Sorry for keeping you up late!

MikeZdeb
Rhodochrosite | Level 12

hi ... no comments about the SQL part that answers your original question (pretty neat), but though that EG-generated SAS code is convenient, I think that it's a bit over the top

(I guess that the code-generation has to be generic enough to handle anything)

I think that it's odd that EG provides a LENGTH statement that declares that all the numeric variables have length 8 when that's the default without a LENGTH statement and that it provides lengths for the character variables when they would have those lengths anyways via the INFORMATS used in the INPUT statement

EG decided to make RET and RETX character variables even though only 10 of 1600+ rows have the character 'C' as the values for those variables ... all the rest are numeric

not sure if I like all those "??" in the INPUT statement since they'll suppress error messages when you read numeric data ... that's not really the safest thing to do, yes/no?

all those COMMA informats will suppress error messages when the INPUT statement encounters problematic numbers (e.g. numbers with embedded spaces)

last, if you run PROC CONTENTS you would see the INFORMATS from the INFORMAT statement written ut by EG next to each variable

however, that's deceiving in that they are not the actual informats used to read the data for all the numeric vacriables

anyways, if you open the DBF file and save as a CSV file, then reopen the CVS file in EXCEL and just save it again, the decimal places (those .0000 after the dates) will disappear

a quick look at the data in Excel (using REAL.CSV after it was re-opened as saved in Execl to drop the decimal places in the dates) and I think you can get by with this (you could skip the INFORMAT and add use ": ??" instead of ": RET." to supress the 10 errors you'd get reading RET and RETX) ...


* what to do with the "C" when encountered as values for RET and RETX ... store as special missing value .C;

proc format;

invalue ret 'C' = .C ;

run;

data real;

infile "z:\real.csv" lrecl=1000 truncover dsd firstobs=2;

input

dateff    : yymmdd.

mktrf smb hml rf umd

monthdate : yymmn.

date      : yymmdd.

hsicmg permno shrout prc

ret       : ret.

retx      : ret.

vwretd vwretx ewretd ewretx year fyear gvkey lpermno 

datadate  : yymmdd.

indfmt    : $4.

consol    : $1.

popsrc    : $1.

datafmt   : $3.

curcd     : $3.

act at bkvlps ch depc dlc dltt dvt ebit ebitda epspi revt txt xdp xint

costat    : $1.

earn dep me month    

;

format dateff date monthdate datadate yymmddd10.;

run;

Message was edited by: Mike Zdeb ... added comment about INFORMATS

data_null__
Jade | Level 19

Use the MISSING statement in the data step to tell SAS to which single charcter values should be read as special missing.

MISSING C;

MikeZdeb
Rhodochrosite | Level 12

hi ... neat

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 30 replies
  • 2896 views
  • 0 likes
  • 4 in conversation