BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Saser_123
Calcite | Level 5

Hello everyone,
Thank you for taking your time to read this question!

I would like to create a variable if some conditions are satisfied by other variables. 

For example:

DS-1 dataset checks for 'name' values in 'col-2' values and then if the value is found then 'col-3' gets 'name' values

 

DS-1

name      col-2           col-3  

Y           Y,H,J           Y

Z           H,K,R          

C          Y,C,Z           C  

 

DS-2 dataset is like a lookup table, it has desired 'name' values along with their respective 'id' values. Plz note that the 

'id' column's order is important. 

 

DS-2

name         id

Y                1

C                2

Z                3

 

DS-3 dataset is the output dataset based on some conditions. 

  • It conditionally compares  DS-1 and DS-2. 
  • conditionally means- if 'id' value (1) 's 'name' value(Y) is present in DS-1 then DS-3 will have 'Y'  AS DISPLAYED IN DS-3 DATASET
  • if 'id' value (1) 's 'name' value(Y) is NOT present in DS-1 then it checks for the next 'id' value(2)'s 'name' value(c) in DS-1.
  • it continues to do this conditional check for all  'id' values until it finds one. If it doesnt find any then 'col_new' will be empty value.
  • So, it checks chronologically and if any value is found then it stops and does not check for the rest of it.

DS-3 (desired output)

col_new

Y

 

Could some one please help me code this logic? If my explanation needs to be rephrased; please do let me know. 

 

Thank you for taking your time to read my question!

 

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

Something like this?

proc sort data=step1 ;
  by id;
run;

data want ;
  set step1;
  length new_var $8 ;
  new_var= coalescec(new_var,col3);
  retain new_var;
run;

Result:

Obs    name    col2     col3    id    new_var

 1      Y      Y,H,J     Y       1       Y
 2      C      Y,C,Z     C       2       Y
 3      Z      H,K,R             3       Y

View solution in original post

9 REPLIES 9
Saser_123
Calcite | Level 5

I have tried a couple of ways using "first. and last.", "select- when", "case-when", and "if- then-else". Below 
sample code that did not work properly. I am willing to take a total new approach if needed. 


Thanks again!

 

data ds-3;
set ds-1 ds-2;
select(id);
  when(1) do;
    if length(col_3) ne 0 then col_new = col_3;
    else col_3 = "";
    end;
  when(2) do;
    if ( length(col_3) ne 0 ) and (lag(length(col_new) eq 0))) then col_new  = col_3; /* lag is used to check if the previous obs of col_new is empty or not */
     end;
 when(3) do;
    if ( length(col_3) ne 0 ) and (lag(length(col_new) eq 0))) then col_new  = col_3;
     end;
otherwise col_new = "";
end;
run;


  

 

ballardw
Super User

@Saser_123 wrote:

Hello everyone,
Thank you for taking your time to read this question!

I would like to create a variable if some conditions are satisfied by other variables. 

For example:

DS-1 dataset checks for 'name' values in 'col-2' values and then if the value is found then 'col-3' gets 'name' values

 

DS-1

name      col-2           col-3  

Y           Y,H,J           Y

Z           H,K,R          

C          Y,C,Z           C  

 

DS-2 dataset is like a lookup table, it has desired 'name' values along with their respective 'id' values. Plz note that the 

'id' column's order is important. 

 

DS-2

name         id

Y                1

C                2

Z                3

 

DS-3 dataset is the output dataset based on some conditions. 

  • It conditionally compares  DS-1 and DS-2. 
  • conditionally means- if 'id' value (1) 's 'name' value(Y) is present in DS-1 then DS-3 will have 'Y'  AS DISPLAYED IN DS-3 DATASET
  • if 'id' value (1) 's 'name' value(Y) is NOT present in DS-1 then it checks for the next 'id' value(2)'s 'name' value(c) in DS-1.
  • it continues to do this conditional check for all  'id' values until it finds one. If it doesnt find any then 'col_new' will be empty value.
  • So, it checks chronologically and if any value is found then it stops and does not check for the rest of it.

DS-3 (desired output)

col_new

Y

 

Could some one please help me code this logic? If my explanation needs to be rephrased; please do let me know. 

 

Thank you for taking your time to read my question!

 


Since every single example of your DS2 data set has a corresponding name in DS1 then you appear to missing at least one rule. Probably involving col_3 because you create the variable but then do not use it in any rule.

Saser_123
Calcite | Level 5

Thank you Ballardw for your reply. 

Regarding col_3:- The part that I had mentioned before  conditionally means- if 'id' value (1) 's 'name' value(Y) is present in DS-1 then DS-3 will have 'Y'     is done comparing "col_3" from DS-1 value with "name' and 'id' in DS-3. Hope I answered your question.
Ballardw, The situation I have is I have to select only one value from "col_3" in DS-1. The selection criteria is from 'id' in DS-2.  If 'id' is 1 and its corresponding 'name' is present in 'col_3' . Then I use that for 'col_new'. If 'id' (1) has no matching 'name" value in 'col_3', then I go and check for  'id' (2).....till any 'id' value mach. If nothing matches then I just leave 'col_new' blank.

Does that explanation help you? If not please let me know I could rephrase it

 

Thank you

Tom
Super User Tom
Super User

Your are not explaining what you mean.  You show two datasets. If you want code to operate on values that are sourced from the two datasets you need to combine them somehow first.  Are you saying that they should be combined two NAME variables have the same values?

Let's try it with your example data.

data ds1;
  input name $  col2 $ col3 $ ;
cards;
Y           Y,H,J          Y
Z           H,K,R          .
C          Y,C,Z           C  
;
 
data ds2;
  input name $ id ;
cards;
Y                1
C                2
Z                3
;

proc sort data=ds1; by name; run;
proc sort data=ds2; by name; run;
data step1;
  merge ds1 ds2;
  by name;
run;
proc print;
run;

So we get this:

Obs    name    col2     col3    id

 1      C      Y,C,Z     C       2
 2      Y      Y,H,J     Y       1
 3      Z      H,K,R             3

Now how do you get from this dataset STEP1 to what you want as the output?  It appears you only want the row where NAME='Y'.  Why?  Why not the row where NAME='C' or NAME='Z'?

Saser_123
Calcite | Level 5

Hi Tom,
You are correct, there was an intermediate step( joining the two datasets)  that I did not mention. Thank you for mentioning that. So from the dataset "step1" I would like to get the values of 'name' based on values in 'id'. something like the below:

 

if ((id =1) and (length(col3 ne 0)) ) then new_var = col3 /* length(col3...) is to make sure it is not empty) */

else if ((id = 2) and length(col3 ne 0) and (lag(new_var) ne 0)))  then new_var = col3 /* lag(new_var) to make sure that                                                                 the previous value of  new_var is not empty */

else if ((id = 3) and length(col3 ne 0) and (lag(new_var) ne 0)))  then new_var = col3

 

so  based on the above conditions new_var will take only one value from  col3...
Does the above explanation make sense? if not, please do let me know I could rephrase it. 

 

Thank you,

Tom
Super User Tom
Super User

LAG() implies there is some type of order to the observations. What order you want the observations to be processed in? The original dataset DS1 did not have any order variable. Once it is sorted by NAME to enable merging the original order is gone. Do you have some other order variable?

 

Also is the real data grouped in anyway? Is there some other variable that groups the data. When using LAG() you do not want to "lag" a value across group boundaries. You don't want the first observations for the new group to see the last value from the old group.

 

it is not clear what you mean by "one value from col3".  The COL3 variable only ever has one value per observation. Just as the NEW_VAR will only ever have one value per observation.  Why did you eliminate two of the observations?  Perhaps you want to collapse to only one observation per group?  Again, is there some type of grouping variable?

 

 

Saser_123
Calcite | Level 5

Thank you Tom for your quick response!

I may have used lag() incorrectly. This is the first time I am using it. I thought it will sort of retain the value of the previous observation in the current observation to do some analysis. I will have to read more on lag(). 

lets start fresh...

 

Obs    name    col2     col3    id

 1      C      Y,C,Z     C       2
 2      Y      Y,H,J     Y       1
 3      Z      H,K,R             3

 from the above dataset I want to create a new variable 'new_var' that will have the value of 'col_3'... if:

1) id = 1 then new_var = y

2) say id=1 's respective col3 value is blank, then only then, go to the next id value that is (2). if it has a respective observation in col3 then new_var = c.

3) say id= 1 and id =2 's corresponding col3 values are blank, then and then only go to id=3....and so on

 

how will I do this for the above dataset...If I were to sort I would sort by the dataset by id. 

 

Can you please help me with the logic to do that? I hope this clears all the confusion.

Note: This is a small part of the entire program and the id is dynamic for a dataset. I will have many such datasets like above with different id values( meaning ds-1 will have id's 1,2...DS-2 will have ids 1,2,3,   ds-3 will have ids 1 etc..). but the logic is the same for all the datasets...

 

Thank you

 

Tom
Super User Tom
Super User

Something like this?

proc sort data=step1 ;
  by id;
run;

data want ;
  set step1;
  length new_var $8 ;
  new_var= coalescec(new_var,col3);
  retain new_var;
run;

Result:

Obs    name    col2     col3    id    new_var

 1      Y      Y,H,J     Y       1       Y
 2      C      Y,C,Z     C       2       Y
 3      Z      H,K,R             3       Y
Saser_123
Calcite | Level 5

Thank you Tom for the code. Yes it works! I have not used coalescec() , thank you for introducing it to me. Since it is sorted by id and taking the first non empty value for new_var is a smart thought! 

Once again, Thank you for taking your time to help me out 😊

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 9 replies
  • 2318 views
  • 0 likes
  • 3 in conversation