Hi SAS Users,
Normally, it is quite familiar to create a lag variable, can I ask how to create a forward variable? And is there any chance to use the option firstobs=2 to do so?
For example,
We have the dataset sashelp.class
Name Sex Age Height Weight
Alfred M 14 69 112.5
Alice F 13 56.5 84
Barbara F 13 65.3 98
Carol F 14 62.8 102.5
Henry M 14 63.5 102.5
James M 12 57.3 83
Jane F 12 59.8 84.5
Janet F 15 62.5 112.5
Jeffrey M 13 62.5 84
John M 12 59 99.5
Joyce F 11 51.3 50.5
Judy F 14 64.3 90
Louise F 12 56.3 77
Mary F 15 66.5 112
Philip M 16 72 150
Robert M 12 64.8 128
Ronald M 15 67 133
Thomas M 11 57.5 85
William M 15 66.5 112
I want to create a column weight_ lead1 while weight_lead1 is forward of weight (or in another word, weight is lag of weight_lead1)
For example
Name Sex Age Height Weight weight_lead1
Alfred M 14 69 112.5 84
Alice F 13 56.5 84 98
Barbara F 13 65.3 98 102.5
Carol F 14 62.8 102.5 102.5
Henry M 14 63.5 102.5 83
Could you please give me a hint or suggestion?
Warm regards.
You are on the right track. For example consider:
data WANT;
merge HAVE
HAVE (firstobs=2 /*keep= rename= options*/);
/*no BY statement*/;
run;
Hi @ChrisNZ
Thank you for your suggestion, But this without BY, SAS announces error
The code:
data uuu;
set sashelp.class;
run;
data WANT;
merge uuu
uuu (firstobs=2 keep=weight rename=(weight=weight_lead1));
/*no BY statement*/;
run;
And the LOG is as below
32 data WANT; 33 merge uuu 34 uuu (firstobs=2 keep=weight rename=(weight=weight_lead1)); 35 /*no BY statement*/; 36 run; ERROR: No BY statement was specified for a MERGE statement. NOTE: The SAS System stopped processing this step because of errors. WARNING: The data set WORK.WANT may be incomplete. When this step was stopped there were 0 observations and 6 variables. WARNING: Data set WORK.WANT was not replaced because this step was stopped.
Warm regards.
Use
options mergenoby=nowarn;
before the step.
Hi @andreas_lds
Thank you for your suggestion.
I apply this options and it run without any announcement about error.
However, from what I understand, this option just for turning off the error announcement when merge without BY.
From this document from lexjansen.
Merging with no "BY statement" is dangerous unless it is a One-to-One merging (combines the first observation
from all data sets that are named in the MERGE statement into the first observation in the new data set, the
second observation from all data sets into the second observation in the new data set, and so on)
So, I think of turning off this option right after running the code for this data step, because to me, when we assign a proc option, it will apply globally, so can you please help me to sort it out (turning off this option)?
And I also think that, apart from the ERROR notification, the results with or without this option would be the same (I checked already, but just ask for confirmation)
Warm regards.
> So, I think of turning off this option right after running the code for this data step, because to me, when we assign a proc option, it will apply globally, so can you please help me to sort it out (turning off this option)?
Your administrator decided that this option was a safer default.
In your case, you know what you want and what you are doing (one-to-one merge), so you needed to disable it before the data step, and you can re-enable it after the data step.
Another way, as shown, is to use two SET statements.
I would not use the suggestion that uses POINT= though. It is good that it is presented to you so you know about it, but know that POINT= is very slow.
True, POINT= is slower, but on the other hand it allow non-sequential read, e.g you can read dataset backward without resorting it. And for small datasets (up to 1 million obs.) the difference in performance is almost intangible.
Like everywhere in programming something for something. 🙂
Bart
Hi @ChrisNZ
It is just because I turn off the notice by using
options mernoby=nowarn;
to compress the error notification and then just recognize that this options stuff will apply globally in my SAS EG ,so I try to turn the notice on again by
options mernoby=warn;
Simplistically speaking, it is a newbie's mistake.
Warm regards.
One other option would be "POINTing" next observation:
data have;
set sashelp.class;
run;
data WANT;
set have nobs=nobs;
if _N_ < nobs
then set have(keep=weight rename=(weight=weight_lead1)) point = _N_;
else weight_lead1 = .;
run;
Bart
Hi @yabwon (Bart)
Thank you for your solution, Your code leads me to the very nice nobs=nobs
However, I have two concerns here in your code, could you please help me to sort it out?
1. When I run the code in my head, let's say the first iteration:
_N_=1 /* From this document _N_ variable increments by one each time the data step passes by a data statement*/
then set have() point = 1
else weight_lead1=.
I do not understand how SAS create values for data WANT here ( there is no hint for the lead condition to me), could you please explain the mechanism to me?
2. When I copy and run your code without any adjustment, the error is as below
32 data WANT; 33 set have nobs=nobs; 34 35 if _N_ < nobs 36 then set have(keep=weight rename=(weight=weight_lead1)) point = _N_; 37 else weight_lead1 = .; 38 run; ERROR: The POINT= data set option is not valid for the data set WORK.HAVE, the data set must be accessible by observation number for POINT= processing. NOTE: The SAS System stopped processing this step because of errors. WARNING: The data set WORK.WANT may be incomplete. When this step was stopped there were 0 observations and 6 variables. WARNING: Data set WORK.WANT was not replaced because this step was stopped.
Could you please help me to sort it out?
Warm regards.
About 2)
From the doc:
Restrictions | You cannot use POINT= with a BY statement, a WHERE statement, or a WHERE= data set option. In addition, you cannot use POINT= with transport format data sets, data sets in sequential format on tape or disk, and SAS/ACCESS views or the SQL procedure views that read data from external files. |
---|
Maybe your file is one of that case? Try to run proc contents on the HAVE and see the output, is the "Point to Observations" set to Yes?
Other possibility is that you have compress option on (like in this thread https://communities.sas.com/t5/SAS-Procedures/Point-error/td-p/144266).
About 1)
The nobs value is extracted in the compilation time. So you can "almost" imagine this code like (assuming that HAVE has 19 observations):
data WANT;
set have nobs=nobs; /* FIRST set */
if _N_ < 19 /* nobs - this value is known in compilation phase */
then set have(keep=weight rename=(weight=weight_lead1)) point = _N_; /* SECOND set */
else weight_lead1 = .;
run;
so the second set reading is executed only for observations 1 to 18 from the first set, and when _N_=19 the "else" is executed.
By the way, I just realised I made a mistake in the code, sorry for that. It should be as follows (the observation pointed should be _N_+1):
data have;
set sashelp.class;
run;
data WANT;
set have nobs=nobs;
if _N_ < nobs then
do;
point = _N_+1;
set have(keep=weight rename=(weight=weight_lead1)) point = point;
end;
else weight_lead1 = .;
run;
Hope it helps.
All the best
Bart
Hi @yabwon
You are totally right, the
options compress=yes;
is applied globally in my SAS EG.
So, when I turn off this function by
options compress=no;
the code ran smoothly
Warm regards.
Or combine the two approaches and use firstobs=2 in the second SET statement:
data want;
set have end=eof;
if ~eof then set have(firstobs=2 keep=weight rename=(weight=weight_lead1));
else weight_lead1=.;
run;
Thank you for your suggestion. I learn more about the option end=eof. The code runs flawlessly.
But I still confuse how the data want receive the value of weight.lead1.
For example, let's say the first iteration.
data want;
/*_N_=1 for the first iteration*/
set sashelp.class end=eof;
/*end=19 (size of sashelp.class)*/
if ~eof then set sashelp.class(firstobs=2 keep=weight rename=(weight=weight_lead1));
/* if not the last observation of sashelp.class then what ???*/
else weight_lead1=.;/*Last obs receive missing value*/
run;
data want;
/*_N_=2 for the second iteration*/
set sashelp.class end=eof;
/*end=19 (size of sashelp.class)*/
if ~eof then set sashelp.class(firstobs=2 keep=weight rename=(weight=weight_lead1));
/* if not the last observation of sashelp.class then what ???*/
else weight_lead1=.;/*Last obs receive missing value*/
run;
I am stuck in the bold place in this code, could you please help me to sort it out?
Warm regards.
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.