BookmarkSubscribeRSS Feed
hjones_6
Calcite | Level 5

I am working on creating dummy variables for emails.  In this data set there is the email (email1) and then whether they opened email1 (email1_email_opened) and then whether they clicked on a call to action link in the email (email1_email_clicked).  When creating these dummy variables for the three original variables I don't want there to be a 0 for the email opened dummy if they did not receive it.  Same thing with the clicking on it variable.  This would be misrepresentation and could lead to incorrect results.  Below is the code that I have so far, however it is not running as I would like.  It is still outputting a 0 for email1_email_opened when email1 was 0.

 

I should also note that when I don't want to have a misrepresentation, I am looking to place a '.' there.  So this would make the variable a character variable.  Maybe that is what is going wrong, but I have been working on this for hours.  I have also attached a sample of the dataset to farther help understand this question.

 

 

if email1 = 'Y' then email1_Dummy = '1';

else email1_Dummy = '0';

if email1_dummy = '0' then email1_EMAIL_OPENED_dummy = '.'

and email1_Click_dum = '.';

if email1_EMAIL_OPENED = 'Y' then email1_EMAIL_OPENED_dummy = '1';

else email1_EMAIL_OPENED_dummy = '0';

if email1_EMAIL_OPENED_dummy = '0' then email1_click_dum = '.';

if email1_EMAIL_CLICKED = 'Y' then email1_click_dum = '1';

 

Sample Dataset:

email1email1_email_openedemail1_email_clicked
YYN
NNN
NNN
YNN
8 REPLIES 8
Reeza
Super User

Your IF statements are misformed.

if email1_dummy = '0' then email1_EMAIL_OPENED_dummy = '.'

and email1_Click_dum = '.';

Should be:

 

if email1_dummy = '0' then call missing( email1_EMAIL_OPENED_dummy,  email1_Click_dum);

In general, I think this is what you want. Another work around is to set all to missing at the top of your program and then assign to 1/0 as needed later on.

 

if email1 = 'Y' then email1_Dummy = 1;

else do;
    email1_Dummy = 0;
    call missing(email1_email_opened_dummy, email1_click_dum);
end;

email1_EMAIL_OPENED_dummy =  (email1_EMAIL_OPENED = 'Y');

if email1_EMAIL_OPENED_dummy = 0 then call missing(email1_click_dum);
if email1_EMAIL_CLICKED = 'Y' then email1_click_dum = 1;

 

hjones_6
Calcite | Level 5

so would it look like this then?

 

if email1 = 'Y' then email1_Dummy = '1';

else email1_Dummy = '0';

if email1_dummy = '0' then call missing( email1_EMAIL_OPENED_dummy, email1_Click_dum);

if email1_EMAIL_OPENED = 'Y' then email1_EMAIL_OPENED_dummy = '1';

else email1_EMAIL_OPENED_dummy = '0';

if email1_EMAIL_OPENED_dummy = '0' then call missing( email1_click_dum);

if email1_EMAIL_CLICKED = 'Y' then email1_click_dum = '1';

 

 

I tried this and it still was giving me the output that I received with the first set of code I tried to run in the original post. 

Reeza
Super User
Did you try the code I posted? Not sure why you want it character, I really wouldn't recommend that, makes it much harder to do summary statistics later on.
hjones_6
Calcite | Level 5

I will try the second one, I was looking at the first one.  The reason I was thinking of having it as character is because I will want to concatenate all of those variables to see which emails people received and with the '.' it would serve as a placeholder so I wouldn't lose that email even though they might not have gotten it.

Reeza
Super User
Having the variable as text or numeric won't really change that ability. You can still you CATT or CATX to simplify. The only thing that may simplify this, is having a common prefix that is used only for those variables then you can short cut reference them using a colon rather than having to list out all variables.
email_flags:
hjones_6
Calcite | Level 5

Hi, I tried the code that you suggested but I am still getting this output: a 0 for email1_email_opened when email1 was 0. 

 

Also if you could walk me through this line of code to help understand what it is saying.

 

AA01044_EMAIL_OPENED_dummy =  (AA01044_EMAIL_OPENED = 'Y');
Reeza
Super User

Please post some sample data and the code you actually ran. 

 

In the line you're asking about, the right hand side of the equation is testing if the variable is equal to Y. In SAS true is returned as 1 and false as 0, so you can avoid the IF/THEN portion and just check if they're equal and get the desired values. 

 

 

 


@hjones_6 wrote:

Hi, I tried the code that you suggested but I am still getting this output: a 0 for email1_email_opened when email1 was 0. 

 

Also if you could walk me through this line of code to help understand what it is saying.

 

AA01044_EMAIL_OPENED_dummy =  (AA01044_EMAIL_OPENED = 'Y');

 

hjones_6
Calcite | Level 5

Thank you for walking me through that, I had thought that was what was going on but I was not sure.

 

Below is my sample code and sample data. I attached a screen capture of the output as well.

 

data sample_data;
	input email1 $ email1_email_opened $ email1_email_clicked $;
	datalines;
	N N N
	Y N N
	Y Y N
	Y Y Y
	;
proc print data=sample_data;
run;

data sample_data2;
set sample_data;
if email1 = 'Y' then email1_Dummy = 1;
else do;
    email1_Dummy = 0;
    call missing(email1_email_opened_dummy, email1_click_dum);
end;
email1_EMAIL_OPENED_dummy =  (email1_EMAIL_OPENED = 'Y');
if email1_EMAIL_OPENED_dummy = 0 then call missing(email1_click_dum);
if email1_EMAIL_CLICKED = 'Y' then email1_click_dum = 1;

run;

 

 lemail_dummies.JPG

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 8 replies
  • 1065 views
  • 1 like
  • 2 in conversation