Hi,
I'm trying to extract from a data set a number of observations with the same characteristic defined by the sort order to an "analysis" data set each time I run a certain data step. The remaining observations are output to a data set with the same name as the input data set and would at a later stage serve as input to the data step again. The first time I run the program, it works, but the second time the output contains only one observation, even if there are more observations with the same characteristic.
The program is shown below. The first time I run I get all the IBM stocks in the data set "Analysis". The second time I run the data step I'm expecting to get all Intel stocks in the data set "Analysis" but somehow I only get one row with Intel stock. Why is that so, and how can I change my program to get all Intel stocks the second time I run the data step.
proc sort
data= sashelp.Stocks (keep= Stock Date)
out= StocksSrtd;
by Stock descending Date;
run;
data Analysis StocksSrtd;
retain AnalysisInd;
set StocksSrtd;
by Stock;
if _n_= 1
then AnalysisInd= 1;
if AnalysisInd= 1
then
output Analysis;
else
output StocksSrtd;
if last.Stock
then AnalysisInd= 0;
run;
In addition to running against different inputs the set of variables on the input data is also different.
Trying to RETAIN a variable that is already being read from an input dataset is not going to be very useful.
1) Variables that are coming from an input dataset are already retained!
2) When the next observation is read the value retained will be overwritten with the value read from the input dataset.
The second time I run the data step I'm expecting to get all Intel stocks in the data set "Analysis" but somehow I only get one row with Intel stock.
I'm not seeing this when I run your program twice. So, please do the following: run the program twice, and then show us the ENTIRE log of both sequential runs. Please copy the log as text and paste it into the window that appears when you click on the </> icon.
Thank you, PaigeMiller, for the interest in my problem. Please see the log below.
1 proc sort 2 data= sashelp.Stocks (keep= Stock Date) 3 out= StocksSrtd; 4 by Stock descending Date; 5 run; NOTE: There were 699 observations read from the data set SASHELP.STOCKS. NOTE: The data set WORK.STOCKSSRTD has 699 observations and 2 variables. NOTE: PROCEDURE SORT used (Total process time): real time 0.03 seconds cpu time 0.03 seconds 6 7 data Analysis StocksSrtd; 8 retain AnalysisInd; 9 set StocksSrtd; 10 by Stock; 11 if _n_= 1 12 then AnalysisInd= 1; 13 if AnalysisInd= 1 14 then 15 output Analysis; 16 else 17 output StocksSrtd; 18 if last.Stock 19 then AnalysisInd= 0; 20 run; NOTE: There were 699 observations read from the data set WORK.STOCKSSRTD. NOTE: The data set WORK.ANALYSIS has 233 observations and 3 variables. NOTE: The data set WORK.STOCKSSRTD has 466 observations and 3 variables. NOTE: DATA statement used (Total process time): real time 0.03 seconds cpu time 0.03 seconds 21 data Analysis StocksSrtd; 22 retain AnalysisInd; 23 set StocksSrtd; 24 by Stock; 25 if _n_= 1 26 then AnalysisInd= 1; 27 if AnalysisInd= 1 28 then 29 output Analysis; 30 else 31 output StocksSrtd; 32 if last.Stock 33 then AnalysisInd= 0; 34 run; NOTE: There were 466 observations read from the data set WORK.STOCKSSRTD. NOTE: The data set WORK.ANALYSIS has 1 observations and 3 variables. NOTE: The data set WORK.STOCKSSRTD has 465 observations and 3 variables. NOTE: DATA statement used (Total process time): real time 0.01 seconds cpu time 0.01 seconds
So, when I ran the code twice, it was
PROC SORT
DATA STEP
PROC SORT
DATA STEP
and when you ran it twice it was
PROC SORT
DATA STEP
DATA STEP
so we get different results.
What do you want as the result? Why are you running it twice, anyway, what is the purpose, what do you expect to gain by running it twice?
I want to process each by group separately. I thought it would save processing time reducing the data set everytime data from a new by group was extracted.
Your 'second run' is using the output from the first run, not the output from the PROC SORT.
It's never a good idea to have the output data set have the same name as the input data set.
1 proc sort 2 data= sashelp.Stocks (keep= Stock Date) 3 out= StocksSrtd; 4 by Stock descending Date; 5 run; NOTE: There were 699 observations read from the data set SASHELP.STOCKS. NOTE: The data set WORK.STOCKSSRTD has 699 observations and 2 variables. NOTE: PROCEDURE SORT used (Total process time): real time 0.03 seconds cpu time 0.03 seconds 6 7 data Analysis StocksSrtd; 8 retain AnalysisInd; 9 set StocksSrtd; 10 by Stock; 11 if _n_= 1 12 then AnalysisInd= 1; 13 if AnalysisInd= 1 14 then 15 output Analysis; 16 else 17 output StocksSrtd; 18 if last.Stock 19 then AnalysisInd= 0; 20 run; NOTE: There were 699 observations read from the data set WORK.STOCKSSRTD. NOTE: The data set WORK.ANALYSIS has 233 observations and 3 variables. NOTE: The data set WORK.STOCKSSRTD has 466 observations and 3 variables. NOTE: DATA statement used (Total process time): real time 0.03 seconds cpu time 0.03 seconds 21 data Analysis StocksSrtd; 22 retain AnalysisInd; 23 set StocksSrtd; 24 by Stock; 25 if _n_= 1 26 then AnalysisInd= 1; 27 if AnalysisInd= 1 28 then 29 output Analysis; 30 else 31 output StocksSrtd; 32 if last.Stock 33 then AnalysisInd= 0; 34 run; NOTE: There were 466 observations read from the data set WORK.STOCKSSRTD. NOTE: The data set WORK.ANALYSIS has 1 observations and 3 variables. NOTE: The data set WORK.STOCKSSRTD has 465 observations and 3 variables. NOTE: DATA statement used (Total process time): real time 0.01 seconds cpu time 0.01 seconds
In addition to running against different inputs the set of variables on the input data is also different.
Trying to RETAIN a variable that is already being read from an input dataset is not going to be very useful.
1) Variables that are coming from an input dataset are already retained!
2) When the next observation is read the value retained will be overwritten with the value read from the input dataset.
Thanks a lot, Tom! You definitely nailed the problem.
A drop statement at the end of the data step will do the trick.
data Analysis StocksSrtd;
retain AnalysisInd;
set StocksSrtd;
by Stock;
if _n_= 1
then AnalysisInd= 1;
if AnalysisInd= 1
then
output Analysis;
else
output StocksSrtd;
if last.Stock
then AnalysisInd= 0;
drop AnalysisInd;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.