Hello everyone,
Thank you for taking your time to read this question!
I would like to create a variable if some conditions are satisfied by other variables.
For example:
DS-1 dataset checks for 'name' values in 'col-2' values and then if the value is found then 'col-3' gets 'name' values
DS-1
name col-2 col-3
Y Y,H,J Y
Z H,K,R
C Y,C,Z C
DS-2 dataset is like a lookup table, it has desired 'name' values along with their respective 'id' values. Plz note that the
'id' column's order is important.
DS-2
name id
Y 1
C 2
Z 3
DS-3 dataset is the output dataset based on some conditions.
DS-3 (desired output)
col_new
Y
Could some one please help me code this logic? If my explanation needs to be rephrased; please do let me know.
Thank you for taking your time to read my question!
Something like this?
proc sort data=step1 ;
by id;
run;
data want ;
set step1;
length new_var $8 ;
new_var= coalescec(new_var,col3);
retain new_var;
run;
Result:
Obs name col2 col3 id new_var 1 Y Y,H,J Y 1 Y 2 C Y,C,Z C 2 Y 3 Z H,K,R 3 Y
I have tried a couple of ways using "first. and last.", "select- when", "case-when", and "if- then-else". Below
sample code that did not work properly. I am willing to take a total new approach if needed.
Thanks again!
data ds-3;
set ds-1 ds-2;
select(id);
when(1) do;
if length(col_3) ne 0 then col_new = col_3;
else col_3 = "";
end;
when(2) do;
if ( length(col_3) ne 0 ) and (lag(length(col_new) eq 0))) then col_new = col_3; /* lag is used to check if the previous obs of col_new is empty or not */
end;
when(3) do;
if ( length(col_3) ne 0 ) and (lag(length(col_new) eq 0))) then col_new = col_3;
end;
otherwise col_new = "";
end;
run;
@Saser_123 wrote:
Hello everyone,
Thank you for taking your time to read this question!I would like to create a variable if some conditions are satisfied by other variables.
For example:
DS-1 dataset checks for 'name' values in 'col-2' values and then if the value is found then 'col-3' gets 'name' values
DS-1
name col-2 col-3
Y Y,H,J Y
Z H,K,R
C Y,C,Z C
DS-2 dataset is like a lookup table, it has desired 'name' values along with their respective 'id' values. Plz note that the
'id' column's order is important.
DS-2
name id
Y 1
C 2
Z 3
DS-3 dataset is the output dataset based on some conditions.
- It conditionally compares DS-1 and DS-2.
- conditionally means- if 'id' value (1) 's 'name' value(Y) is present in DS-1 then DS-3 will have 'Y' AS DISPLAYED IN DS-3 DATASET
- if 'id' value (1) 's 'name' value(Y) is NOT present in DS-1 then it checks for the next 'id' value(2)'s 'name' value(c) in DS-1.
- it continues to do this conditional check for all 'id' values until it finds one. If it doesnt find any then 'col_new' will be empty value.
- So, it checks chronologically and if any value is found then it stops and does not check for the rest of it.
DS-3 (desired output)
col_new
Y
Could some one please help me code this logic? If my explanation needs to be rephrased; please do let me know.
Thank you for taking your time to read my question!
Since every single example of your DS2 data set has a corresponding name in DS1 then you appear to missing at least one rule. Probably involving col_3 because you create the variable but then do not use it in any rule.
Thank you Ballardw for your reply.
Regarding col_3:- The part that I had mentioned before conditionally means- if 'id' value (1) 's 'name' value(Y) is present in DS-1 then DS-3 will have 'Y' is done comparing "col_3" from DS-1 value with "name' and 'id' in DS-3. Hope I answered your question.
Ballardw, The situation I have is I have to select only one value from "col_3" in DS-1. The selection criteria is from 'id' in DS-2. If 'id' is 1 and its corresponding 'name' is present in 'col_3' . Then I use that for 'col_new'. If 'id' (1) has no matching 'name" value in 'col_3', then I go and check for 'id' (2).....till any 'id' value mach. If nothing matches then I just leave 'col_new' blank.
Does that explanation help you? If not please let me know I could rephrase it
Thank you
Your are not explaining what you mean. You show two datasets. If you want code to operate on values that are sourced from the two datasets you need to combine them somehow first. Are you saying that they should be combined two NAME variables have the same values?
Let's try it with your example data.
data ds1;
input name $ col2 $ col3 $ ;
cards;
Y Y,H,J Y
Z H,K,R .
C Y,C,Z C
;
data ds2;
input name $ id ;
cards;
Y 1
C 2
Z 3
;
proc sort data=ds1; by name; run;
proc sort data=ds2; by name; run;
data step1;
merge ds1 ds2;
by name;
run;
proc print;
run;
So we get this:
Obs name col2 col3 id 1 C Y,C,Z C 2 2 Y Y,H,J Y 1 3 Z H,K,R 3
Now how do you get from this dataset STEP1 to what you want as the output? It appears you only want the row where NAME='Y'. Why? Why not the row where NAME='C' or NAME='Z'?
Hi Tom,
You are correct, there was an intermediate step( joining the two datasets) that I did not mention. Thank you for mentioning that. So from the dataset "step1" I would like to get the values of 'name' based on values in 'id'. something like the below:
if ((id =1) and (length(col3 ne 0)) ) then new_var = col3 /* length(col3...) is to make sure it is not empty) */
else if ((id = 2) and length(col3 ne 0) and (lag(new_var) ne 0))) then new_var = col3 /* lag(new_var) to make sure that the previous value of new_var is not empty */
else if ((id = 3) and length(col3 ne 0) and (lag(new_var) ne 0))) then new_var = col3
so based on the above conditions new_var will take only one value from col3...
Does the above explanation make sense? if not, please do let me know I could rephrase it.
Thank you,
LAG() implies there is some type of order to the observations. What order you want the observations to be processed in? The original dataset DS1 did not have any order variable. Once it is sorted by NAME to enable merging the original order is gone. Do you have some other order variable?
Also is the real data grouped in anyway? Is there some other variable that groups the data. When using LAG() you do not want to "lag" a value across group boundaries. You don't want the first observations for the new group to see the last value from the old group.
it is not clear what you mean by "one value from col3". The COL3 variable only ever has one value per observation. Just as the NEW_VAR will only ever have one value per observation. Why did you eliminate two of the observations? Perhaps you want to collapse to only one observation per group? Again, is there some type of grouping variable?
Thank you Tom for your quick response!
I may have used lag() incorrectly. This is the first time I am using it. I thought it will sort of retain the value of the previous observation in the current observation to do some analysis. I will have to read more on lag().
lets start fresh...
Obs name col2 col3 id 1 C Y,C,Z C 2 2 Y Y,H,J Y 1 3 Z H,K,R 3
from the above dataset I want to create a new variable 'new_var' that will have the value of 'col_3'... if:
1) id = 1 then new_var = y
2) say id=1 's respective col3 value is blank, then only then, go to the next id value that is (2). if it has a respective observation in col3 then new_var = c.
3) say id= 1 and id =2 's corresponding col3 values are blank, then and then only go to id=3....and so on
how will I do this for the above dataset...If I were to sort I would sort by the dataset by id.
Can you please help me with the logic to do that? I hope this clears all the confusion.
Note: This is a small part of the entire program and the id is dynamic for a dataset. I will have many such datasets like above with different id values( meaning ds-1 will have id's 1,2...DS-2 will have ids 1,2,3, ds-3 will have ids 1 etc..). but the logic is the same for all the datasets...
Thank you
Something like this?
proc sort data=step1 ;
by id;
run;
data want ;
set step1;
length new_var $8 ;
new_var= coalescec(new_var,col3);
retain new_var;
run;
Result:
Obs name col2 col3 id new_var 1 Y Y,H,J Y 1 Y 2 C Y,C,Z C 2 Y 3 Z H,K,R 3 Y
Thank you Tom for the code. Yes it works! I have not used coalescec() , thank you for introducing it to me. Since it is sorted by id and taking the first non empty value for new_var is a smart thought!
Once again, Thank you for taking your time to help me out 😊
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.