Hello guys,
I have a table containning different names for the same registry. The error is due to the name variable being pulled from different databases and is simply a different spelling since the ID is definitely the same.
I would like to correct this error by simply substituting one of the versions of the name by the other.
I came up with the following table to exemplify my problem:
DATA HAVE;
INFILE DATALINES DLM=' ';
LENGTH NAME $ 20;
INPUT NAME $ ID;
DATALINES;
ALEX 123456
ALEX 123456
ALEXANDER 123456
;
RUN;
To try and correct it I wanted to use the lag function to check if the ID from the current row is the same from the row above but with a different name and, when that is the case, substitute the name in the current row with the name on the previous row. I came up with the following code.
DATA WANT;
SET HAVE;
IF ID = LAG(ID)
AND NAME NE LAG(NAME)
THEN NAME = LAG(NAME);
RUN;
But this code gives me a blank when the result I was expecting would be Alex. I was able to determine that the problem is in the IF statement, because using only the the lag(name) as a new variable, without the if statement, it works fine.
I am at a loss of what the problem might be and would appreciate any help or pointers.
Thanks in advance for the help
Alex (Alexander : ) )
Don't use conditional assignments with lag. See introduction in:
http://support.sas.com/resources/papers/proceedings09/055-2009.pdf
The easiest way to fix your data is to first extract the id and decode you want, be it the first or some other calculation, then merge that back onto the original data. Alternatively you could retain a value, e.g:
data want; set have; by id; length lstname $200; retain lstname; if first.id then lstname=name; else if name ne lstname then name=lstname; run;
Note how I don't code all in uppercase as I don't want to shout the code at you.
Don't use conditional assignments with lag. See introduction in:
http://support.sas.com/resources/papers/proceedings09/055-2009.pdf
The easiest way to fix your data is to first extract the id and decode you want, be it the first or some other calculation, then merge that back onto the original data. Alternatively you could retain a value, e.g:
data want; set have; by id; length lstname $200; retain lstname; if first.id then lstname=name; else if name ne lstname then name=lstname; run;
Note how I don't code all in uppercase as I don't want to shout the code at you.
Thank you for the quick reply and apologies for the shouting : )
Your two different calls to LAG(NAME) are maintaining two different stacks of values. Since the second one only executes occasionally the values it pulls back off the stack are totally different.
It looks like you are trying to keep the first copy of the name. So just tell SAS that is what you are doing.
data want ;
set have ;
by id;
if first.id then name1 = name;
name=name1;
retain name1 ;
drop name1;
run;
You can use lag in conditional statements lag but indeed it is tricky, either you populate the lagged var in another variable or use ifc/ifn
So your logic is correct and requires a very minor tweak as i mentioned, populate lagged var in another variable as illustrated below
DATA HAVE;
INFILE DATALINES DLM=' ';
LENGTH NAME $ 20;
INPUT NAME $ ID;
DATALINES;
ALEX 123456
ALEX 123456
ALEXANDER 123456
;
RUN;
DATA WANT;
SET HAVE;
temp=lag(name);
IF ID = LAG(ID)
AND NAME NE LAG(NAME)
THEN NAME =temp;
drop temp;
RUN;
or
DATA HAVE;
INFILE DATALINES DLM=' ';
LENGTH NAME $ 20;
INPUT NAME $ ID;
DATALINES;
ALEX 123456
ALEX 123456
ALEXANDER 123456
;
RUN;
DATA WANT;
SET HAVE;
name=ifc( ID = LAG(ID) and NAME NE LAG(NAME),lag(name),name);
RUN;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.