Hi,
I am very new to sas and I am trying to calculate the return of some companies using the ret=dif(RI)/lag (RI).
I have imported my csv file using the following commands:
proc import out=ARI
datafile= "c:\evantus\A-RI-main.csv" dbms = csv replace;
getnames=yes;
datarow= 2;
GUESSINGROWS=20000;
format caldt date9.;
run;
now I am trying to use the following statements to calculate the returns:
data retA;
set ari;
by dscd caldt;
ret=dif(RI)/lag(RI);
if first. dscd then ret=.;
keep comnam dscd caldt ri;
run;
but I see an error in my log: ERROR: BY variables are not properly sorted on data set WORK.ARI.
comnam=NIPPON TELG. & TEL. - TOT RETURN IND dscd=740847 caldt=02/09/2000 RI=148.74 FIRST.dscd=0
LAST.dscd=0 FIRST.caldt=1 LAST.caldt=1 ret=. _ERROR_=1 _N_=44
Can you help me fix this problem?
Thanks,
Niloo
If you use BY in a data step you first need to sort your data by those variables.
PS. Regarding your title - there is no RET statement and it helps to title your questions clearly.
proc sort data=reta; by dscd caldt;
run;
data retA;
set ari;
by dscd caldt;
ret=dif(RI)/lag(RI);
if first. dscd then ret=.;
keep comnam dscd caldt ri;
run;
but I
If you use BY in a data step you first need to sort your data by those variables.
PS. Regarding your title - there is no RET statement and it helps to title your questions clearly.
proc sort data=reta; by dscd caldt;
run;
data retA;
set ari;
by dscd caldt;
ret=dif(RI)/lag(RI);
if first. dscd then ret=.;
keep comnam dscd caldt ri;
run;
but I
Hi Reeza,
Thank you very much for your answer.
Do I need to sort the imported csv file? the ARI?
Because I sorted the imported csv file then I run the following but the result was just the ARI file but sorted first by dscd and then by date.
data retA;
set ari;
by dscd caldt;
ret=dif(RI)/lag(RI);
if first. dscd then ret=.;
keep comnam dscd caldt ri;
run;
it did not calculate the returns.
Thanks,
Niloo
To sort a dataset you can use the PROC SORT procedure. Sorting the CSV file won't help much as a CSV file is just a text and depending on how the text is format it might not even sort anyway. For example if your dates are displayed in M/D/Y order in the CSV the you cannot sort it by date.
So your steps would be:
1) IMPORT
2) PROC SORT
3) Then your data step to derive some variables.
Hi Tom,
Thank you for your response. I think I am following the steps that you mentioned:
proc import out=ARI
datafile= "c:\evantus\A-RI-main.csv" dbms = csv replace;
getnames=yes;
datarow= 2;
GUESSINGROWS=20000;
format caldt date9.;
run;
proc sort data=ari;
by caldt dscd;
run;
data reta;
set ari;
by caldt dscd;
ret=dif(RI)/lag(RI);
if first. dscd then ret=.;
keep comnam dscd caldt ri;
run;
Is it not the same steps that you mentioned?Could you please tell me what my mistake is? I used the same formula for my Index file and it worked, but here sas creates "reta" but does not calculate the "ret". The result is just like the original file "ARI":smileycry:
Thank you,
Niloo
How do you know it did not calculate RET? You did not include it in the data set that you generated.
Remove the KEEP statement.
If you want to limit the variables read from a dataset use the KEEP= dataset option instead.
set ari (keep= comnam dscd caldt ri);
Tom,
When I remove the keep statement I will have a ret column in my results but the values will all be ".":
data reta;
set ari;
by caldt dscd;
ret=dif(RI)/lag(RI);
if first.dscd then ret=.;
run;
as you see the ret column is empty.
comnam | dscd | caldt | RI | ret |
NIPPON TELG. & TEL. - TOT RETURN IND | 740847 | 12/10/1999 | 157.86 | |
NIPPON TELG. & TEL. - TOT RETURN IND | 740847 | 12/13/1999 | 157.86 | |
NIPPON TELG. & TEL. - TOT RETURN IND | 740847 | 12/14/1999 | 158.78 | |
NIPPON TELG. & TEL. - TOT RETURN IND | 740847 | 12/15/1999 | 157.86 | |
NIPPON TELG. & TEL. - TOT RETURN IND | 740847 | 12/16/1999 | 153.3 | |
NIPPON TELG. & TEL. - TOT RETURN IND | 740847 | 12/17/1999 | 155.13 | |
NIPPON TELG. & TEL. - TOT RETURN IND | 740847 | 12/20/1999 | 153.3 | |
NIPPON TELG. & TEL. - TOT RETURN IND | 740847 | 12/21/1999 | 149.65 | |
NIPPON TELG. & TEL. - TOT RETURN IND | 740847 | 12/22/1999 | 153.3 |
Thanks
niloo
Those values should work fine. Are you sure that your data actually has those values?
data ret ;
input ri @@ ;
lag_ri = lag(ri);
dif_ri = dif(ri);
ret = dif_ri / lag_ri ;
put (_all_) (=);
cards;
157.86 157.86 158.78 157.86 153.3
;;;;
ri=157.86 lag_ri=. dif_ri=. ret=.
ri=157.86 lag_ri=157.86 dif_ri=0 ret=0
ri=158.78 lag_ri=157.86 dif_ri=0.92 ret=0.0058279488
ri=157.86 lag_ri=158.78 dif_ri=-0.92 ret=-0.005794181
ri=153.3 lag_ri=157.86 dif_ri=-4.56 ret=-0.028886355
Tom,
So you mean my results are fine without the if statement?
Thanks,
Niloo
I mean your program is fine. If you are not seeing the results you expect then look at the input data and make sure that it is correct.
Did you not receive any notes or warnings in the SAS log?
No, no errors. Everything is in blue and black and there is no error. The data set is successfully created.
Check your proc import is correctly reading all values. You usually can't include a format statement in proc import as far as I'm aware.
Otherwise run the code below and post the full log from the execution.
proc import out=ARI
datafile= "c:\evantus\A-RI-main.csv" dbms = csv replace;
getnames=yes;
datarow= 2;
GUESSINGROWS=20000;
run;
proc sort data=ari;
by caldt dscd;
run;
data reta;
set ari;
by caldt dscd;
ret=dif(RI)/lag(RI);
if first.dscd then ret=.;
run;
Hi Reeza,
I did what you said, but using the if statement causes my ret column in the reta file to be empty.
Here is my log:
377 proc import out=ARI
378 datafile= "c:\evantus\A-RI-main.csv" dbms = csv replace;
379 getnames=yes;
380 datarow= 2;
381 GUESSINGROWS=20000;
382 run;
383 /**********************************************************************
384 * PRODUCT: SAS
385 * VERSION: 9.4
386 * CREATOR: External File Interface
387 * DATE: 31JUL15
388 * DESC: Generated SAS Datastep Code
389 * TEMPLATE SOURCE: (None Specified.)
390 ***********************************************************************/
391 data WORK.ARI ;
392 %let _EFIERR_ = 0; /* set the ERROR detection macro variable */
393 infile 'c:\evantus\A-RI-main.csv' delimiter = ',' MISSOVER DSD lrecl=32767 firstobs=2 ;
394 informat comnam $64. ;
395 informat dscd $6. ;
396 informat caldt mmddyy10. ;
397 informat RI best32. ;
398 format comnam $64. ;
399 format dscd $6. ;
400 format caldt mmddyy10. ;
401 format RI best12. ;
402 input
403 comnam $
404 dscd $
405 caldt
406 RI
407 ;
408 if _ERROR_ then call symputx('_EFIERR_',1); /* set ERROR detection macro variable */
409 run;
NOTE: The infile 'c:\evantus\A-RI-main.csv' is:
Filename=c:\evantus\A-RI-main.csv,
RECFM=V,LRECL=32767,File Size (bytes)=1689113,
Last Modified=30Jul2015:17:48:25,
Create Time=30Jul2015:17:48:25
NOTE: 28769 records were read from the infile 'c:\evantus\A-RI-main.csv'.
The minimum record length was 40.
The maximum record length was 89.
NOTE: The data set WORK.ARI has 28769 observations and 4 variables.
NOTE: DATA statement used (Total process time):
real time 0.09 seconds
cpu time 0.04 seconds
28769 rows created in WORK.ARI from c:\evantus\A-RI-main.csv.
NOTE: WORK.ARI data set was successfully created.
NOTE: The data set WORK.ARI has 28769 observations and 4 variables.
NOTE: PROCEDURE IMPORT used (Total process time):
real time 3.61 seconds
cpu time 3.46 seconds
410 proc sort data=ari;
411 by caldt dscd;
412 run;
NOTE: There were 28769 observations read from the data set WORK.ARI.
NOTE: The data set WORK.ARI has 28769 observations and 4 variables.
NOTE: PROCEDURE SORT used (Total process time):
real time 0.26 seconds
cpu time 0.03 seconds
413 data reta;
414 set ari;
415 by caldt dscd;
416 ret=dif(RI)/lag(RI);
417 if first.dscd then ret=.;
418 keep comnam dscd caldt ret;
419 run;
NOTE: Missing values were generated as a result of performing an operation on missing values.
Each place is given by: (Number of times) at (Line):(Column).
1 at 416:12
NOTE: There were 28769 observations read from the data set WORK.ARI.
NOTE: The data set WORK.RETA has 28769 observations and 4 variables.
NOTE: DATA statement used (Total process time):
real time 0.14 seconds
cpu time 0.03 seconds
But as soon as I delete the if statement the returns will be calculated.
Thanks,
Niloo
Somehow from the beginning of this thread to this point you've reversed your BY statement.
It should be:
BY DSCD calcdt;
rather than
BY CALCDT DSCD;
And you absolutely need the IF statement otherwise it uses the values from the previous DSCD in the calculation which is very likely incorrect.
Dear Reeza,
When I use By DSCD caldate; and add the if statement I see this error in the log:
proc import out=TRI
611 datafile= "c:\evantus\T-RI-main.csv" dbms = csv replace;
612 getnames=yes;
613 datarow= 2;
614 GUESSINGROWS=20000;
615 run;
616 /**********************************************************************
617 * PRODUCT: SAS
618 * VERSION: 9.4
619 * CREATOR: External File Interface
620 * DATE: 02AUG15
621 * DESC: Generated SAS Datastep Code
622 * TEMPLATE SOURCE: (None Specified.)
623 ***********************************************************************/
624 data WORK.TRI ;
625 %let _EFIERR_ = 0; /* set the ERROR detection macro variable */
626 infile 'c:\evantus\T-RI-main.csv' delimiter = ',' MISSOVER DSD lrecl=32767 firstobs=2 ;
627 informat comnam $75. ;
628 informat dscd $6. ;
629 informat caldt mmddyy10. ;
630 informat RI best32. ;
631 format comnam $75. ;
632 format dscd $6. ;
633 format caldt mmddyy10. ;
634 format RI best12. ;
635 input
636 comnam $
637 dscd $
638 caldt
639 RI
640 ;
641 if _ERROR_ then call symputx('_EFIERR_',1); /* set ERROR detection macro variable */
642 run;
NOTE: The infile 'c:\evantus\T-RI-main.csv' is:
Filename=c:\evantus\T-RI-main.csv,
RECFM=V,LRECL=32767,File Size (bytes)=1990450,
Last Modified=02Aug2015:20:21:45,
Create Time=02Aug2015:18:31:20
NOTE: 29908 records were read from the infile 'c:\evantus\T-RI-main.csv'.
The minimum record length was 40.
The maximum record length was 89.
NOTE: The data set WORK.TRI has 29908 observations and 4 variables.
NOTE: DATA statement used (Total process time):
real time 0.03 seconds
cpu time 0.03 seconds
29908 rows created in WORK.TRI from c:\evantus\T-RI-main.csv.
NOTE: WORK.TRI data set was successfully created.
NOTE: The data set WORK.TRI has 29908 observations and 4 variables.
NOTE: PROCEDURE IMPORT used (Total process time):
real time 3.77 seconds
cpu time 3.77 seconds
643 proc sort data=tri;
644 by dscd caldt;
645 run;
NOTE: There were 29908 observations read from the data set WORK.TRI.
NOTE: The data set WORK.TRI has 29908 observations and 4 variables.
NOTE: PROCEDURE SORT used (Total process time):
real time 0.02 seconds
cpu time 0.01 seconds
Thanks,
Niloo
NOTE: This SAS session is using a registry in WORK. All changes will be lost at the end of
this session.
646
647 data rett;
648 set tri;
649 by caldt dscd;
650 ret=dif(RI)/lag(RI);
651 if first.dscd then ret=.;
652 keep comnam dscd caldt ret;
653 run;
ERROR: BY variables are not properly sorted on data set WORK.TRI.
comnam=LACROSSE FOOTWEAR DEAD - ACQD.BY 264407 - TOT RETURN IND dscd=130894 caldt=08/03/2012
RI=210.62 FIRST.caldt=1 LAST.caldt=1 FIRST.dscd=1 LAST.dscd=1 ret=. _ERROR_=1 _N_=44
NOTE: Missing values were generated as a result of performing an operation on missing values.
Each place is given by: (Number of times) at (Line):(Column).
1 at 650:12
NOTE: The SAS System stopped processing this step because of errors.
NOTE: There were 45 observations read from the data set WORK.TRI.
WARNING: The data set WORK.RETT may be incomplete. When this step was stopped there were 43
observations and 4 variables.
NOTE: DATA statement used (Total process time):
real time 0.02 seconds
cpu time 0.01 seconds
NOTE: This SAS session is using a registry in WORK. All changes will be lost at the end of
this session.
NOTE: This SAS session is using a registry in WORK. All changes will be lost at the end of
this session.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.