Good morning,
I have spent days trying to work the logic on some IF, THEN, DO, ELSE statements and every time I get one to work, the other doesn't.
I am using SAS 9.2. I have tried consecutive IF, THEN, DO statements to try to make the IF THEN ELSE statements read certain blocks of data but that hasn't been working for me. I have tried everything I know how to try.
Background:
The data is a result of comparing two datasets for differences using PROC COMPARE.
Need:
I need to be able to differentiate between a difference as a result of having a value missing in one dataset and not the other and when the difference is as a result of an actual difference (no missing values).
Issues:
I found I needed to remove the X's from difference because sometimes the if statements would work if the DIFF value was blank. I included two WANT2 tables to show a couple of examples of what I've tried. But I could have included 50. I can get everything to = 1 or MISSING, but not either.
data have;
infile datalines dlm = ':' truncover;
length SOURCE $10. ID1 $2. ID2 $2. PHONE $15.;
input SOURCE $ ID1 $ ID2 $ PHONE $ ;
datalines;
SOURCE1: 1: 1: 5555555:
SOURCE2: 1: 1: 5555555:
DIFF: 1: 1: .:
SOURCE1: 1: 2: :
SOURCE2: 1: 2: :
DIFF: 1: 2: .:
SOURCE1: 2: 1: 9999999:
SOURCE2: 2: 1:
DIFF: 2: 1: XXXXXXX:
SOURCE1: 3: 1: :
SOURCE2: 3: 1: 4444444:
DIFF: 3: 1: XXXXXXX:
SOURCE1: 4: 1: 4448888:
SOURCE2: 4: 1: 4444444:
DIFF: 4: 1: ...XXXX:
;
run;
data want;
set have;
if SOURCE = "DIFF" and index(PHONE,'X') then PHONE = "";
else PHONE = PHONE;
run;
data want2;
set have;
by
ID1 ID2;
if SOURCE = "SOURCE2" then do;
if PHONE = "" and lag(PHONE) ne "" then do;
if SOURCE = "DIFF" then PHONE = "MISSING";
end;
else if PHONE ne "" and lag(PHONE) = "" then do;
if SOURCE = "DIFF" then PHONE = "MISSING";
end;
else if PHONE ne lag(PHONE) then do;
if SOURCE = "DIFF" then PHONE = '1';
end;
end;
run;
data want2;
set have;
by
ID1 ID2;
if SOURCE = "SOURCE2" and PHONE = "" and lag(PHONE) ne "" then do;
if SOURCE = "DIFF" then PHONE = "MISSING";
end;
else if SOURCE = "SOURCE2" and PHONE ne "" and lag(PHONE) = "" then do;
if SOURCE = "DIFF" then PHONE = "MISSING";
end;
else if SOURCE = "SOURCE2" and PHONE ne lag(PHONE) then do;
if SOURCE = "DIFF" then PHONE = '1';
end;
run;
I would love the end result to be this:
SOURCE ID1 ID2 PHONE
SOURCE1 1 1 5555555
SOURCE2 1 1 5555555
DIFF 1 1
SOURCE1 1 2
SOURCE2 1 2
DIFF 1 2
SOURCE1 2 1 9999999
SOURCE2 2 1
DIFF 2 1 MISSING
SOURCE1 3 1
SOURCE2 3 1 4444444
DIFF 3 1 MISSING
SOURCE1 4 1 4448888
SOURCE2 4 1 4447777
DIFF 4 1 1
Where am I going wrong?
While I haven't gone through this in a lot of detail, the LAG function inside an IF statement doesn't work the way you think it should work.
You need to use the LAG function in a command that always executes and is not inside an IF statement.
For example, in dataset WANT2, try this immediately after the BY command:
prevphone=lag(phone);
and of course use prevphone in the rest of the code.
While I haven't gone through this in a lot of detail, the LAG function inside an IF statement doesn't work the way you think it should work.
You need to use the LAG function in a command that always executes and is not inside an IF statement.
For example, in dataset WANT2, try this immediately after the BY command:
prevphone=lag(phone);
and of course use prevphone in the rest of the code.
Thank you PaigeMiller. I have learned my lesson with lag. Your suggestion worked great. I had to add some coding to get my desired results, but everything looks just the way it should. Thank you so much. I cannot wait to put this issue to rest!
data want3 (drop = phone_result_fill prevphone phone_result);
set want;
by
ID1 ID2;
prevphone = lag(phone);
if source = "source2" then do;
if phone = "" and prevphone ne "" or phone ne "" and prevphone = "" then phone_result = "MISSING";
else if phone ne prevphone then phone_result = "1";
end;
retain phone_result_fill;
if not missing (phone_result) then phone_result_fill = phone_result;
phone_result = phone_result_fill;
if source = "DIFF" then phone = phone_result;
run;
The fact that you do not format your code in a readable way, and use upcase coding really makes your code hard to read. From what I gather from your post, I would imagine your issue is caused by using the lag() functions in if's. You can read several posts on here about why it is not a good idea to do this. For my money I would find it far simpler to transpose (yes, I know against what I normally suggest) and then do the conditionals:
data have; infile datalines dlm = ':' truncover; length source $10. id1 $2. id2 $2. phone $15.; input source $ id1 $ id2 $ phone $ ; datalines; SOURCE1: 1: 1: 5555555: SOURCE2: 1: 1: 5555555: DIFF: 1: 1: .: SOURCE1: 1: 2: : SOURCE2: 1: 2: : DIFF: 1: 2: .: SOURCE1: 2: 1: 9999999: SOURCE2: 2: 1: DIFF: 2: 1: XXXXXXX: SOURCE1: 3: 1: : SOURCE2: 3: 1: 4444444: DIFF: 3: 1: XXXXXXX: SOURCE1: 4: 1: 4448888: SOURCE2: 4: 1: 4444444: DIFF: 4: 1: ...XXXX: ; run; proc transpose data=have out=want; by id1 id2; var phone; id source; run;
This should put all the necessary information on one row.
Thank you for your response RW9. I will certainly not be using lag in IF statements any more. I like your idea of using transpose, but my actual data set is way too large for transpose to work. Thanks again (and I'll work on making my code more readable).
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.