BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
gadnuk
Calcite | Level 5

Hello,

 

I'm trying to compare the equality of multiple variables in my dataset for each subject. Below is a sample of my data. I need to compare whether the variable PASS is equal for TEST a & b for each subject, then whether the variable LEVEL is equal for that subject for TEST a & b. Then I need to compare whether both variables pass the test for equality. If only one fails, then the variable MATCH (in my want dataset) is set to 'No'.

 

Dataset

 

data have;
input id test $ pass $ level $ ;
datalines;
1 a 'Yes' 'hard'
1 b 'Yes' 'hard'
2 a 'Yes' 'hard'
2 b 'Yes' 'medium'
3 a 'Yes' 'medium'
3 b 'No' 'medium'
4 a 'Yes' 'hard'
4 b 'Yes' 'hard'
5 a 'No' 'medium'
5 b 'Yes' 'medium'
;
run;
 
data want;
input id Match $;
datalines;
1 'Yes'
2 'No'
3 'No'
4 'Yes'
5 'No'
;
run;
 
I have managed to accomplish this with several proc transpose statements, but I think there must be a simpler / more elegant method to get to the same solution. As a relatively inexperienced programmer, I'd like to improve my code and it's hard to do without knowing how I could be writing code better. Here are the steps I used to get my 'want' dataset:
 
 
proc sort data=have;
by id;
run;
 
proc transpose data=have out=trans;
by id;
var pass hard;
run;
 
data equal;
set trans;
length match $3;
if col1=col2 then match = "Yes"; else match = "No";
drop col1 col2 _:;
run;
 
proc transpose data=equal out=trans2;
by id;
var match;
run;
 
data want;
set trans2;
length match $3;
if col1 = 'No' OR col2 = 'No' then match= 'No'; else match = 'Yes';
drop col1 col2 _:;
run;
 
 
1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

The LAG function is your friend:

 

data have;
input id test $ pass $ level $ ;
datalines;
1 a 'Yes' 'hard'
1 b 'Yes' 'hard'
2 a 'Yes' 'hard'
2 b 'Yes' 'medium'
3 a 'Yes' 'medium'
3 b 'No' 'medium'
4 a 'Yes' 'hard'
4 b 'Yes' 'hard'
5 a 'No' 'medium'
5 b 'Yes' 'medium'
;
run;
data want;
  set have;
  by id;
  if pass=lag(pass) and level=lag(level) then match='YES';
  else match='NO';
  if last.id;
run;

 

  1. The "by id" statement creates the automatic variables first.id and last.id, indicating whether the record-in-hand is the first or last record for a given id.
  2. For every incoming record update the queue of "historic" pass values, same with level.  The queue is only one member deep (lag2 would have two members).  So the if test effectively compares current and preceding value, and assign MATCH based on the result.
  3. The "if last.id" is a subsetting if, resulting in output only for the last record of each id.  So as long as you have 2 records per id, you are not contaminating the result data set with comparison of the beginning of one id vs the end of the prior id.
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

3 REPLIES 3
mkeintz
PROC Star

The LAG function is your friend:

 

data have;
input id test $ pass $ level $ ;
datalines;
1 a 'Yes' 'hard'
1 b 'Yes' 'hard'
2 a 'Yes' 'hard'
2 b 'Yes' 'medium'
3 a 'Yes' 'medium'
3 b 'No' 'medium'
4 a 'Yes' 'hard'
4 b 'Yes' 'hard'
5 a 'No' 'medium'
5 b 'Yes' 'medium'
;
run;
data want;
  set have;
  by id;
  if pass=lag(pass) and level=lag(level) then match='YES';
  else match='NO';
  if last.id;
run;

 

  1. The "by id" statement creates the automatic variables first.id and last.id, indicating whether the record-in-hand is the first or last record for a given id.
  2. For every incoming record update the queue of "historic" pass values, same with level.  The queue is only one member deep (lag2 would have two members).  So the if test effectively compares current and preceding value, and assign MATCH based on the result.
  3. The "if last.id" is a subsetting if, resulting in output only for the last record of each id.  So as long as you have 2 records per id, you are not contaminating the result data set with comparison of the beginning of one id vs the end of the prior id.
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
novinosrin
Tourmaline | Level 20
data have;
input id test $ pass $ level $ ;
datalines;
1 a 'Yes' 'hard'
1 b 'Yes' 'hard'
2 a 'Yes' 'hard'
2 b 'Yes' 'medium'
3 a 'Yes' 'medium'
3 b 'No' 'medium'
4 a 'Yes' 'hard'
4 b 'Yes' 'hard'
5 a 'No' 'medium'
5 b 'Yes' 'medium'
;
run;

proc sql;
create table want as
select Id, case when count(distinct pass)>1 or count(distinct level)>1 then 'NO' else 'YES' end as Match
from have
group by id;
quit;

or the same with ifc

 

proc sql;
create table want as
select Id, ifc(count(distinct pass)>1 or count(distinct level)>1 ,'NO' ,'YES')  as Match
from have
group by id;
quit;

 

 

Ksharp
Super User


data have;
input id test $ pass $ level $ ;
datalines;
1 a 'Yes' 'hard'
1 b 'Yes' 'hard'
2 a 'Yes' 'hard'
2 b 'Yes' 'medium'
3 a 'Yes' 'medium'
3 b 'No' 'medium'
4 a 'Yes' 'hard'
4 b 'Yes' 'hard'
5 a 'No' 'medium'
5 b 'Yes' 'medium'
;
run;
proc sql;
create table want as
select id,case when count(distinct catx(' ',pass,level))=1 then 'Yes' else 'No ' end as flag
from have
group by id;
quit;

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 2691 views
  • 0 likes
  • 4 in conversation