BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
niloo
Obsidian | Level 7

Hi,

I am very new to sas and I am trying to calculate the return of some companies using the ret=dif(RI)/lag (RI).

I have imported my csv file using the following commands:

proc import out=ARI

datafile= "c:\evantus\A-RI-main.csv" dbms = csv replace;

getnames=yes;

datarow= 2;

GUESSINGROWS=20000;

format caldt date9.;

run;

now I am trying to use the following statements to calculate the returns:

data retA;

set ari;

by dscd caldt;

ret=dif(RI)/lag(RI);

if first. dscd then ret=.;

keep comnam dscd caldt ri;

run;

but I see an error in my log:  ERROR: BY variables are not properly sorted on data set WORK.ARI.

comnam=NIPPON TELG. & TEL. - TOT RETURN IND dscd=740847 caldt=02/09/2000 RI=148.74 FIRST.dscd=0

LAST.dscd=0 FIRST.caldt=1 LAST.caldt=1 ret=. _ERROR_=1 _N_=44

Can you help me fix this problem?

Thanks,

Niloo

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User

If you use BY in a data step you first need to sort your data by those variables.

PS. Regarding your title - there is no RET statement and it helps to title your questions clearly.

proc sort data=reta; by dscd caldt;

run;

data retA;

set ari;

by dscd caldt;

ret=dif(RI)/lag(RI);

if first. dscd then ret=.;

keep comnam dscd caldt ri;

run;

but I

View solution in original post

19 REPLIES 19
Reeza
Super User

If you use BY in a data step you first need to sort your data by those variables.

PS. Regarding your title - there is no RET statement and it helps to title your questions clearly.

proc sort data=reta; by dscd caldt;

run;

data retA;

set ari;

by dscd caldt;

ret=dif(RI)/lag(RI);

if first. dscd then ret=.;

keep comnam dscd caldt ri;

run;

but I

niloo
Obsidian | Level 7

Hi Reeza,

Thank you very much for your answer.

Do I need to sort the imported csv file? the ARI?

Because I sorted the imported csv file then I run the following but the result was just the ARI file but sorted first by dscd and then by date.

data retA;

set ari;

by dscd caldt;

ret=dif(RI)/lag(RI);

if first. dscd then ret=.;

keep comnam dscd caldt ri;

run;

it did not calculate the returns.

Thanks,

Niloo

Tom
Super User Tom
Super User

To sort a dataset you can use the PROC SORT procedure. Sorting the CSV file won't help much as a CSV file is just a text and depending on how the text is format it might not even sort anyway. For example if your dates are displayed in M/D/Y order in the CSV the you cannot sort it by date.

So your steps would be:

1) IMPORT

2) PROC SORT

3) Then your data step to derive some variables.

niloo
Obsidian | Level 7

Hi Tom,

Thank you for your response. I think I am following the steps that you mentioned:

proc import out=ARI

datafile= "c:\evantus\A-RI-main.csv" dbms = csv replace;

getnames=yes;

datarow= 2;

GUESSINGROWS=20000;

format caldt date9.;

run;

proc sort data=ari;

by  caldt dscd;

run;

data reta;

set ari;

by caldt dscd;

ret=dif(RI)/lag(RI);

if first. dscd then ret=.;

keep comnam dscd caldt ri;

run;

Is it not the same steps that you mentioned?Could you please tell me what my mistake is? I used the same formula for my Index file and it worked, but here sas creates "reta" but does not calculate the "ret". The result is just like the original file "ARI":smileycry:

Thank you,

Niloo

Tom
Super User Tom
Super User

How do you know it did not calculate RET?  You did not include it in the data set that you generated.

Remove the KEEP statement.

If you want to limit the variables read from a dataset use the KEEP= dataset option instead.

set ari (keep= comnam dscd caldt ri);

niloo
Obsidian | Level 7

Tom,

When I remove the keep statement  I will have a ret column in my results but the values will all be ".":

data reta;

set ari;

by caldt dscd;

ret=dif(RI)/lag(RI);

if first.dscd then ret=.;

run;

as you see the ret column is empty.

comnamdscdcaldtRIret
NIPPON TELG. & TEL. - TOT RETURN IND74084712/10/1999157.86
NIPPON TELG. & TEL. - TOT RETURN IND74084712/13/1999157.86
NIPPON TELG. & TEL. - TOT RETURN IND74084712/14/1999158.78
NIPPON TELG. & TEL. - TOT RETURN IND74084712/15/1999157.86
NIPPON TELG. & TEL. - TOT RETURN IND74084712/16/1999153.3
NIPPON TELG. & TEL. - TOT RETURN IND74084712/17/1999155.13
NIPPON TELG. & TEL. - TOT RETURN IND74084712/20/1999153.3
NIPPON TELG. & TEL. - TOT RETURN IND74084712/21/1999149.65
NIPPON TELG. & TEL. - TOT RETURN IND74084712/22/1999153.3

Thanks

niloo

Tom
Super User Tom
Super User

Those values should work fine. Are you sure that your data actually has those values?

data ret ;

  input ri @@ ;

  lag_ri = lag(ri);

  dif_ri = dif(ri);

  ret = dif_ri / lag_ri ;

  put (_all_) (=);

cards;

157.86 157.86 158.78 157.86 153.3

;;;;

ri=157.86 lag_ri=. dif_ri=. ret=.

ri=157.86 lag_ri=157.86 dif_ri=0 ret=0

ri=158.78 lag_ri=157.86 dif_ri=0.92 ret=0.0058279488

ri=157.86 lag_ri=158.78 dif_ri=-0.92 ret=-0.005794181

ri=153.3 lag_ri=157.86 dif_ri=-4.56 ret=-0.028886355

niloo
Obsidian | Level 7

Tom,

So you mean my results are fine without the if statement?

Thanks,

Niloo

Tom
Super User Tom
Super User

I mean your program is fine. If you are not seeing the results you expect then look at the input data and make sure that it is correct.

Did you not receive any notes or warnings in the SAS log?

niloo
Obsidian | Level 7

No, no errors. Everything is in blue and black and there is no error. The data set is successfully created. 

Reeza
Super User

Check your proc import is correctly reading all values. You usually can't include a format statement in proc import as far as I'm aware.

Otherwise run the code below and post the full log from the execution.

proc import out=ARI

datafile= "c:\evantus\A-RI-main.csv" dbms = csv replace;

getnames=yes;

datarow= 2;

GUESSINGROWS=20000;

run;

proc sort data=ari;

by  caldt dscd;

run;

data reta;

set ari;

by caldt dscd;

ret=dif(RI)/lag(RI);

if first.dscd then ret=.;

run;

niloo
Obsidian | Level 7

Hi Reeza,

I did what you said, but using the if statement causes my ret column in the reta file to be empty.

Here is my log:

377  proc import out=ARI

378  datafile= "c:\evantus\A-RI-main.csv" dbms = csv replace;

379  getnames=yes;

380  datarow= 2;

381  GUESSINGROWS=20000;

382  run;

383   /**********************************************************************

384   *   PRODUCT:   SAS

385   *   VERSION:   9.4

386   *   CREATOR:   External File Interface

387   *   DATE:      31JUL15

388   *   DESC:      Generated SAS Datastep Code

389   *   TEMPLATE SOURCE:  (None Specified.)

390   ***********************************************************************/

391      data WORK.ARI    ;

392      %let _EFIERR_ = 0; /* set the ERROR detection macro variable */

393      infile 'c:\evantus\A-RI-main.csv' delimiter = ',' MISSOVER DSD lrecl=32767 firstobs=2 ;

394         informat comnam $64. ;

395         informat dscd $6. ;

396         informat caldt mmddyy10. ;

397         informat RI best32. ;

398         format comnam $64. ;

399         format dscd $6. ;

400         format caldt mmddyy10. ;

401         format RI best12. ;

402      input

403                  comnam $

404                  dscd $

405                  caldt

406                  RI

407      ;

408      if _ERROR_ then call symputx('_EFIERR_',1);  /* set ERROR detection macro variable */

409      run;

NOTE: The infile 'c:\evantus\A-RI-main.csv' is:

      Filename=c:\evantus\A-RI-main.csv,

      RECFM=V,LRECL=32767,File Size (bytes)=1689113,

      Last Modified=30Jul2015:17:48:25,

      Create Time=30Jul2015:17:48:25

NOTE: 28769 records were read from the infile 'c:\evantus\A-RI-main.csv'.

      The minimum record length was 40.

      The maximum record length was 89.

NOTE: The data set WORK.ARI has 28769 observations and 4 variables.

NOTE: DATA statement used (Total process time):

      real time           0.09 seconds

      cpu time            0.04 seconds

28769 rows created in WORK.ARI from c:\evantus\A-RI-main.csv.

NOTE: WORK.ARI data set was successfully created.

NOTE: The data set WORK.ARI has 28769 observations and 4 variables.

NOTE: PROCEDURE IMPORT used (Total process time):

      real time           3.61 seconds

      cpu time            3.46 seconds

410  proc sort data=ari;

411  by  caldt dscd;

412  run;

NOTE: There were 28769 observations read from the data set WORK.ARI.

NOTE: The data set WORK.ARI has 28769 observations and 4 variables.

NOTE: PROCEDURE SORT used (Total process time):

      real time           0.26 seconds

      cpu time            0.03 seconds

413  data reta;

414  set ari;

415  by caldt dscd;

416  ret=dif(RI)/lag(RI);

417  if first.dscd then ret=.;

418  keep comnam dscd caldt ret;

419  run;

NOTE: Missing values were generated as a result of performing an operation on missing values.

      Each place is given by: (Number of times) at (Line):(Column).

      1 at 416:12

NOTE: There were 28769 observations read from the data set WORK.ARI.

NOTE: The data set WORK.RETA has 28769 observations and 4 variables.

NOTE: DATA statement used (Total process time):

      real time           0.14 seconds

      cpu time            0.03 seconds

But as soon as I delete the if statement  the returns will be calculated.

Thanks,

Niloo

Reeza
Super User

Somehow from the beginning of this thread to this point you've reversed your BY statement.

It should be:

BY DSCD calcdt;

rather than

BY CALCDT DSCD;

And you absolutely need the IF statement otherwise it uses the values from the previous DSCD in the calculation which is very likely incorrect.

niloo
Obsidian | Level 7

Dear Reeza,

When I use By DSCD caldate; and add the if statement I see this error in the log:

  proc import out=TRI

611  datafile= "c:\evantus\T-RI-main.csv" dbms = csv replace;

612  getnames=yes;

613  datarow= 2;

614  GUESSINGROWS=20000;

615  run;

616   /**********************************************************************

617   *   PRODUCT:   SAS

618   *   VERSION:   9.4

619   *   CREATOR:   External File Interface

620   *   DATE:      02AUG15

621   *   DESC:      Generated SAS Datastep Code

622   *   TEMPLATE SOURCE:  (None Specified.)

623   ***********************************************************************/

624      data WORK.TRI    ;

625      %let _EFIERR_ = 0; /* set the ERROR detection macro variable */

626      infile 'c:\evantus\T-RI-main.csv' delimiter = ',' MISSOVER DSD lrecl=32767 firstobs=2 ;

627         informat comnam $75. ;

628         informat dscd $6. ;

629         informat caldt mmddyy10. ;

630         informat RI best32. ;

631         format comnam $75. ;

632         format dscd $6. ;

633         format caldt mmddyy10. ;

634         format RI best12. ;

635      input

636                  comnam $

637                  dscd $

638                  caldt

639                  RI

640      ;

641      if _ERROR_ then call symputx('_EFIERR_',1);  /* set ERROR detection macro variable */

642      run;

NOTE: The infile 'c:\evantus\T-RI-main.csv' is:

      Filename=c:\evantus\T-RI-main.csv,

      RECFM=V,LRECL=32767,File Size (bytes)=1990450,

      Last Modified=02Aug2015:20:21:45,

      Create Time=02Aug2015:18:31:20

NOTE: 29908 records were read from the infile 'c:\evantus\T-RI-main.csv'.

      The minimum record length was 40.

      The maximum record length was 89.

NOTE: The data set WORK.TRI has 29908 observations and 4 variables.

NOTE: DATA statement used (Total process time):

      real time           0.03 seconds

      cpu time            0.03 seconds

29908 rows created in WORK.TRI from c:\evantus\T-RI-main.csv.

NOTE: WORK.TRI data set was successfully created.

NOTE: The data set WORK.TRI has 29908 observations and 4 variables.

NOTE: PROCEDURE IMPORT used (Total process time):

      real time           3.77 seconds

      cpu time            3.77 seconds

643  proc sort data=tri;

644  by dscd caldt;

645  run;

NOTE: There were 29908 observations read from the data set WORK.TRI.

NOTE: The data set WORK.TRI has 29908 observations and 4 variables.

NOTE: PROCEDURE SORT used (Total process time):

      real time           0.02 seconds

      cpu time            0.01 seconds

Thanks,

Niloo

NOTE: This SAS session is using a registry in WORK.  All changes will be lost at the end of

      this session.

646

647  data rett;

648  set tri;

649  by caldt dscd;

650  ret=dif(RI)/lag(RI);

651  if first.dscd then ret=.;

652  keep comnam dscd caldt ret;

653  run;

ERROR: BY variables are not properly sorted on data set WORK.TRI.

comnam=LACROSSE FOOTWEAR DEAD - ACQD.BY 264407 - TOT RETURN IND dscd=130894 caldt=08/03/2012

RI=210.62 FIRST.caldt=1 LAST.caldt=1 FIRST.dscd=1 LAST.dscd=1 ret=. _ERROR_=1 _N_=44

NOTE: Missing values were generated as a result of performing an operation on missing values.

      Each place is given by: (Number of times) at (Line):(Column).

      1 at 650:12

NOTE: The SAS System stopped processing this step because of errors.

NOTE: There were 45 observations read from the data set WORK.TRI.

WARNING: The data set WORK.RETT may be incomplete.  When this step was stopped there were 43

         observations and 4 variables.

NOTE: DATA statement used (Total process time):

      real time           0.02 seconds

      cpu time            0.01 seconds

NOTE: This SAS session is using a registry in WORK.  All changes will be lost at the end of

      this session.

NOTE: This SAS session is using a registry in WORK.  All changes will be lost at the end of

      this session.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 19 replies
  • 3030 views
  • 3 likes
  • 3 in conversation