Hello all,
I'm trying to write a code that will calculate the column's colores in yellow in the attached file.
I tryed the following SAS code :
if ID_number is not equal to ID_number_lag
than sales_in_first_year_in_the_ job=salse
else sales_in_first_year_in_the_ job=lag(sales_in_first_year_in_the_ job)
But I didn't receive the expected results.
Does anyone have an idea for codes (for the two colums)?
Thanks in advance!
Lior
@lioradam wrote:
I wonder if I can adjust the code in order to receive a column for sales in the second year from the start of the group and a column for sales in the second year from the end of the group (instead of column for sales in the first year and in the last year).
Yes, this is possible. To be more flexible in view of potential future requirements (sales in third year, etc.) you could write the SALES values (of one ID at a time) to a temporary array and then select what you need from the array between the two DOW-loops:
data want(drop=_m);
array s[99] _temporary_;
do _n_=1 by 1 until(last.id);
set have;
by id notsorted;
s[n_yrs]=sales;
_m=max(_m,n_yrs);
end;
sales_second=s[2];
if _m>=2 then sales_lastbut1=s[_m-1];
do _n_=1 to _n_;
set have;
output;
end;
call missing(of s[*]);
run;
This code does no longer rely on the order of observations within the BY groups.
Hello @lioradam,
One idea is to use a double DOW-loop:
data want;
do _n_=1 by 1 until(last.id);
set have;
by id notsorted;
if n_yrs=1 then sales_first=sales;
if last.id then sales_last=sales;
end;
do _n_=1 to _n_;
set have;
output;
end;
run;
This assumes that the input dataset (HAVE) is grouped by ID and that the last observation within each BY group contains the SALES value for the last year in the job. (If needed, the code could be modified to determine the observation with the highest number of years in the job per ID based on variable N_YRS rather than relying on ascending sort order.) Variable SALES_LAST could also be set unconditionally.
In the first loop the new variables are populated. The second loop writes them to dataset WANT, together with the content of HAVE.
Hi,
Thank you very much!
Its works perfectly.
Since this code is new to me, I wonder if I can adjust the code in order to receive a column for sales in the second year from the start of the group and a column for sales in the second year from the end of the group (instead of column for sales in the first year and in the last year).
Kind Regards,
Lior
@lioradam wrote:
I wonder if I can adjust the code in order to receive a column for sales in the second year from the start of the group and a column for sales in the second year from the end of the group (instead of column for sales in the first year and in the last year).
Yes, this is possible. To be more flexible in view of potential future requirements (sales in third year, etc.) you could write the SALES values (of one ID at a time) to a temporary array and then select what you need from the array between the two DOW-loops:
data want(drop=_m);
array s[99] _temporary_;
do _n_=1 by 1 until(last.id);
set have;
by id notsorted;
s[n_yrs]=sales;
_m=max(_m,n_yrs);
end;
sales_second=s[2];
if _m>=2 then sales_lastbut1=s[_m-1];
do _n_=1 to _n_;
set have;
output;
end;
call missing(of s[*]);
run;
This code does no longer rely on the order of observations within the BY groups.
Thank you!
Lior
First of all, get rid of the five (or so) syntax errors you managed to put into three lines of code.
Hint: "is not equal to" is not a valid SAS comparison operator. See SAS Operators
Next, never call the LAG function in a conditional branch, as that will prevent the FIFO queue from being filled in each data step iteration.
Third, please provide usable example data in a data step with datalines and show the complete log of the step you ran (all code and messages).
You don't say what you are trying to calculate. Sum by group? Cumulative sum by group? Mean by group? Std Deviation by group? Kurtosis?
Hi,
Just the value of the sales in the second year.
This means that there will be a column called "sales in the second year", in which in the first four rows the value will be 102, in the next three rows the value will be 40 and in the last six rows, the value will be 1352.
Regards,
Lior
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.