BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Multipla99
Quartz | Level 8

Hi,

 

I'm trying to extract from a data set a number of observations with the same characteristic defined by the sort order to an "analysis" data set each time I run a certain data step. The remaining observations are output to a data set with the same name as the input data set and would at a later stage serve as input to the data step again. The first time I run the program, it works, but the second time the output contains only one observation, even if there are more observations with the same characteristic.

 

The program is shown below. The first time I run I get all the IBM stocks in the data set "Analysis". The second time I run the data step I'm expecting to get all Intel stocks in the data set "Analysis" but somehow I only get one row with Intel stock. Why is that so, and how can I change my program to get all Intel stocks the second time I run the data step.

 

proc sort 
data= sashelp.Stocks (keep= Stock Date)
out= StocksSrtd;
by Stock descending Date;
run;

data 	Analysis StocksSrtd;
retain 	AnalysisInd;
set StocksSrtd;
by 	Stock;
if 	_n_= 1 
then AnalysisInd= 1;
if 	 AnalysisInd= 1 
then 
output 	Analysis;
else 
output 	StocksSrtd;
if 			last.Stock 
then	AnalysisInd= 0;
run;

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

In addition to running against different inputs the set of variables on the input data is also different.

Trying to RETAIN a variable that is already being read from an input dataset is not going to be very useful.

1) Variables that are coming from an input dataset are already retained!

2) When the next observation is read the value retained will be overwritten with the value read from the input dataset.

View solution in original post

7 REPLIES 7
PaigeMiller
Diamond | Level 26

The second time I run the data step I'm expecting to get all Intel stocks in the data set "Analysis" but somehow I only get one row with Intel stock.

 

I'm not seeing this when I run your program twice. So, please do the following: run the program twice, and then show us the ENTIRE log of both sequential runs. Please copy the log as text and paste it into the window that appears when you click on the </> icon.

 

2021-11-26 08_27_29-Reply to Message - SAS Support Communities — Mozilla Firefox.png

--
Paige Miller
Multipla99
Quartz | Level 8

Thank you, PaigeMiller, for the interest in my problem. Please see the log below.

 

 

1    proc sort
2    data= sashelp.Stocks (keep= Stock Date)
3    out= StocksSrtd;
4    by Stock descending Date;
5    run;

NOTE: There were 699 observations read from the data set SASHELP.STOCKS.
NOTE: The data set WORK.STOCKSSRTD has 699 observations and 2 variables.
NOTE: PROCEDURE SORT used (Total process time):
      real time           0.03 seconds
      cpu time            0.03 seconds


6
7    data  Analysis StocksSrtd;
8    retain  AnalysisInd;
9    set StocksSrtd;
10   by  Stock;
11   if  _n_= 1
12   then AnalysisInd= 1;
13   if   AnalysisInd= 1
14   then
15   output  Analysis;
16   else
17   output  StocksSrtd;
18   if      last.Stock
19   then  AnalysisInd= 0;
20   run;

NOTE: There were 699 observations read from the data set WORK.STOCKSSRTD.
NOTE: The data set WORK.ANALYSIS has 233 observations and 3 variables.
NOTE: The data set WORK.STOCKSSRTD has 466 observations and 3 variables.
NOTE: DATA statement used (Total process time):
      real time           0.03 seconds
      cpu time            0.03 seconds


21   data  Analysis StocksSrtd;
22   retain  AnalysisInd;
23   set StocksSrtd;
24   by  Stock;
25   if  _n_= 1
26   then AnalysisInd= 1;
27   if   AnalysisInd= 1
28   then
29   output  Analysis;
30   else
31   output  StocksSrtd;
32   if      last.Stock
33   then  AnalysisInd= 0;
34   run;

NOTE: There were 466 observations read from the data set WORK.STOCKSSRTD.
NOTE: The data set WORK.ANALYSIS has 1 observations and 3 variables.
NOTE: The data set WORK.STOCKSSRTD has 465 observations and 3 variables.
NOTE: DATA statement used (Total process time):
      real time           0.01 seconds
      cpu time            0.01 seconds

PaigeMiller
Diamond | Level 26

So, when I ran the code twice, it was

 

PROC SORT

DATA STEP

PROC SORT

DATA STEP

 

and when you ran it twice it was 

 

PROC SORT
DATA STEP

DATA STEP

 

so we get different results.

 

What do you want as the result? Why are you running it twice, anyway, what is the purpose, what do you expect to gain by running it twice?

--
Paige Miller
Multipla99
Quartz | Level 8

I want to process each by group separately. I thought it would save processing time reducing the data set everytime data from a new by group was extracted.  

Reeza
Super User

Your 'second run' is using the output from the first run, not the output from the PROC SORT. 

It's never a good idea to have the output data set have the same name as the input data set. 

 

1    proc sort
2    data= sashelp.Stocks (keep= Stock Date)
3    out= StocksSrtd;
4    by Stock descending Date;
5    run;

NOTE: There were 699 observations read from the data set SASHELP.STOCKS.
NOTE: The data set WORK.STOCKSSRTD has 699 observations and 2 variables.
NOTE: PROCEDURE SORT used (Total process time):
      real time           0.03 seconds
      cpu time            0.03 seconds


6
7    data  Analysis StocksSrtd;
8    retain  AnalysisInd;
9    set StocksSrtd;
10   by  Stock;
11   if  _n_= 1
12   then AnalysisInd= 1;
13   if   AnalysisInd= 1
14   then
15   output  Analysis;
16   else
17   output  StocksSrtd;
18   if      last.Stock
19   then  AnalysisInd= 0;
20   run;

NOTE: There were 699 observations read from the data set WORK.STOCKSSRTD.
NOTE: The data set WORK.ANALYSIS has 233 observations and 3 variables.
NOTE: The data set WORK.STOCKSSRTD has 466 observations and 3 variables.
NOTE: DATA statement used (Total process time):
      real time           0.03 seconds
      cpu time            0.03 seconds


21   data  Analysis StocksSrtd;
22   retain  AnalysisInd;
23   set StocksSrtd;
24   by  Stock;
25   if  _n_= 1
26   then AnalysisInd= 1;
27   if   AnalysisInd= 1
28   then
29   output  Analysis;
30   else
31   output  StocksSrtd;
32   if      last.Stock
33   then  AnalysisInd= 0;
34   run;

NOTE: There were 466 observations read from the data set WORK.STOCKSSRTD.
NOTE: The data set WORK.ANALYSIS has 1 observations and 3 variables.
NOTE: The data set WORK.STOCKSSRTD has 465 observations and 3 variables.
NOTE: DATA statement used (Total process time):
      real time           0.01 seconds
      cpu time            0.01 seconds

 

Tom
Super User Tom
Super User

In addition to running against different inputs the set of variables on the input data is also different.

Trying to RETAIN a variable that is already being read from an input dataset is not going to be very useful.

1) Variables that are coming from an input dataset are already retained!

2) When the next observation is read the value retained will be overwritten with the value read from the input dataset.

Multipla99
Quartz | Level 8

Thanks a lot, Tom! You definitely nailed the problem.

A drop statement at the end of the data step will do the trick. 

 

data 	Analysis StocksSrtd;
retain 	AnalysisInd;
set StocksSrtd;
by 	Stock;
if 	_n_= 1 
then AnalysisInd= 1;
if 	 AnalysisInd= 1 
then 
output 	Analysis;
else 
output 	StocksSrtd;
if 			last.Stock 
then	AnalysisInd= 0;
drop  AnalysisInd;
run;

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 7 replies
  • 1059 views
  • 1 like
  • 4 in conversation