BookmarkSubscribeRSS Feed
AZIQ1
Quartz | Level 8

data have;
input ID1 ID2 R I Seq ;
datalines;
1 11 1 . 1
2 11 0 1 2
1 143 0 1 1
1 22 1 . 1
2 22 1 . 2
3 22 1 . 1
4 22 1 . 2
1 165 0 1 1
1 164 0 1 1
1 166 0 1 1
1 33 0 1 1
2 33 1 . 2
3 33 1 . 3
4 33 1 . 4
;

proc sort data = have;
by ID2;
run;


data want;
set have;
by ID2;
if I = 1 and lag(R) =1 the Rnew = 1;
Else Rnew = 0;
run;

 

Want

To create a new field called Rnew where if I = 1 then Rnew = Lag(R) 

I sorted the data by ID2 so that this rule applies to only ID2 and changes everytime ID2 changes - but its not working

 

For ID2 = 11 SAS is reading the R field for the next ID2  (143) and ignoring the lag function and shows RNew = 1 when infact it should be 0;

Any ideas what am I doing incorrect and how do I tell SAS to apply this rule for a set of IDs only.

Thanks

 

WANT

ID1ID2RISeqRnew
1111 10
2110120
1221 10
3221 10
2221 20
4221 20
1330111
2331 20
3331 30
4331 40
11430110
11640110
11650110
11660110
3 REPLIES 3
ballardw
Super User

Is that "want" data show the result of your code? You "want" code has at least one error:

by ID2 and seq;

AS the variable AND on your BY statement does not exist in the shown HAVE data set.

Second, without the "and" you will get an error of

ERROR: BY variables are not properly sorted on data set USER.HAVE.

So, please provide your example data in the form of a data step that will run to create your Have data such as.

data have;
input ID1 ID2 R I Seq ;
datalines;
1 11 1 .  1 
2 11 0 1 2 
1 143 0 1 1 
1 22 1  . 1 
2 22 1 .  2 
3 22 1 .  1 
4 22 1 .  2 
1 165 0 1 1 
1 164 0 1 1 
1 166 0 1 1 
1 33 0 1 1 
2 33 1 .  3 
3 33 1 .  4 
4 33 1 .  2 
;

plus the actual code used to sort the data as your shown "want" data set does not show data sorted by Id2 Seq.

 

And a similar data set to show the desired result.

 

AZIQ1
Quartz | Level 8

Thank you I fixed my question and reposted it with the code.

Sorting is fine its just the if statement that I need help with, How do I tell SAS to stop at each set of ID2.

mkeintz
PROC Star

Does your reporting of data set WANT really represent your desired output?  I ask because it does not meet my understanding of your requirement.

 

I suggest:

 

data have;
input ID1 ID2 R I Seq ;
datalines;
1 11 1 . 1
2 11 0 1 2
1 143 0 1 1
1 22 1 . 1
2 22 1 . 2
3 22 1 . 1
4 22 1 . 2
1 165 0 1 1
1 164 0 1 1
1 166 0 1 1
1 33 0 1 1
2 33 1 . 2
3 33 1 . 3
4 33 1 . 4
;
proc sort data = have;
  by ID2;
run;
data want;
  set have;
  by ID2;
  rnew=ifn(i=1 and first.id2=0,lag(r),0);
run;

 

 

The statement

 

rnew=ifn(i=1 and first.id2=0,lag(r),0);

probably produces the result you describe, while the code you provide:

 

if I = 1 and lag(R) =1 then Rnew = 1;
Else Rnew = 0;

probably does not.   That's for two reasons:

  1.  By testing for first.id2=0 I am allowing lag(R) to be a possible result only when the record-in-hand is not the start of an ID2 group.
  2.  But even if you had only one ID2 group, you would get erroneous results.  That's because the lag function is really a queue-updater  (a LIFO queue).  If you put that queue updater function as the result of the IF condition, then the queue is not updated with every observation.  And therefore the lagged result will return the value of R not from the prior observations, but from the prior observation that satisfied the "IF I=1" condition.

 

What you probably want is to update the queue with every observation, but return that update to the RNEW result only when the if condition is met.   Unlike the IF … THEN … statement, the IFN function always updates both of the possible results (i.e. both LAG(r) and 0).  And it returns LAG(r) when I=1 and first.ID2=0,  and returns zero otherwise.  

 

Whenever I see or use the LAG function, in my mind I substitute the term UFQ  (update-fifo-queue), which I find to be a useful way to recognize what the function is actually doing.

 

Also:

 

BTW, it looks like your dataset HAVE is already grouped by ID2, even though it is not SORTED by ID2, you could avoid the PROC SORT.  Just change the "BY ID2" statement to "BY ID2 NOTSORTED".

 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 873 views
  • 0 likes
  • 3 in conversation