Hello
For each customer (ID) there are multiple rows.
Target: have one row for each customer with the original raw data in the last raw and create a new field called "Ind" that get value 1 if year in first row for customer=2017 and 0 otherwise.
I expect to get following output table:
1 2019 1 0
2 2019 0 1
3 2019 1 0
(for ID=1 year in first row is not 2017 so Ind=0
for ID=2 year in first row is 2017 so Ind=1
for ID=3 year in first row is 2017 so Ind=1)
Why in my code I don't get the desired output??
data aaa;
input id date status;
cards;
1 2018 1
1 2019 1
2 2017 0
2 2018 0
2 2019 0
3 2017 0
3 2018 0
3 2019 1
;
run;
data bbb;
set aaa;
by id;
retain IndBase;
if first.id and date=2017 then IndBase=1;
else IndBase=0;
if last.id then output;
run;
@Ronein please post your data in a usable form. The data step with aaa yields an error.
The data step you've posted is not working and the data bit under CARDS doesn't match your input statement (so couldn't fix it for you).
Please post a working and tested SAS data step which creates useful sample data.
There you go. Btw your posted result data is different from the description.
data aaa;
input id date status;
cards;
1 2018 1
1 2019 1
2 2017 0
2 2018 0
2 2019 0
3 2017 0
3 2018 0
3 2019 1
;
data bbb;
set aaa;
by id;
if first.id then ind=ifn(date=2017, 1, 0);
if last.id;
retain ind;
run;
Result:
id date status ind 1 2019 1 0 2 2019 0 1 3 2019 1 1
@Ronein wrote:
I am not near computer now but i don't understand whst is wrong.Maybe change name of field date to ddate.
Data is very simple and contain 3 fields.
Look at what you've posted. The code is all on one line. But even if I'm nice and split this over multiple lines I only get data for the first two columns (except for the first row; and the last row of data also looks wrong).
data aaa;
input id date status;
cards;
1 2018 11
2019 12
2017 02
2018 02
2019 03
2017 03
2018 03
2019 1
;
run;
Kudos for @PeterClemmensen who apparently was able to decipher what you've posted
@Patrick, One of my greater achievements 😉
Hello @Ronein,
@Ronein wrote:
Hello
For each customer (ID) there are multiple rows.
Target: have one row for each customer with the original raw data in the last raw and create a new field called "Ind" that get value 1 if year in first row for customer=2017 and 0 otherwise.
I expect to get following output table:
1 2019 1 0
2 2019 0 1
3 2019 1 0(for ID=1 year in first row is not 2017 so Ind=0
for ID=2 year in first row is 2017 so Ind=1
for ID=3 year in first row is 2017 so Ind=1)
Why in my code I don't get the desired output??
data aaa;
input id date status;
cards;
1 2018 1
1 2019 1
2 2017 0
2 2018 0
2 2019 0
3 2017 0
3 2018 0
3 2019 1
;
run;
data bbb;
set aaa;
by id;
retain IndBase;
if first.id and date=2017 then IndBase=1;
else IndBase=0;
if last.id then output;
run;
[It's interesting that the "one-line code" is displayed nicely when you quote it. EDIT: But only while you're editing!]
A logical error causes the incorrect results: The statement else IndBase=0; is executed for the second, third, ..., last observation of each ID (because the first.id criterion is not met) and thus overwrites the value of interest from the first observation of the ID. Just change the IF/THEN-ELSE statement to
if first.id then IndBase=(date=2017);
so that only one assignment statement for IndBase is executed per BY group.
(Note that the Ind value for ID=3 in your sample output does not match the description.)
Perfect and thank you so much.
As I understand from you when I wrote the code:
"If FIRST.ID and date=2017 then IND=1; else IND=0; "
then else Ind=0 apply on all rows and not only for first row of each ID.
I didn't know that and was sure that else will work also on first row of each ID.
I saw your perfect solution to write:
"IF First.ID then Ind=(date=2017);"
Will it be correct (alternative equivalent way) to write:
""If FIRST.ID and date=2017 then do;
IND=1;
else IND=0;
end;
"
One more question please:
Is location of retain statement important?
As I know I should write "Retain ID" before "If first.ID....."?
Can I also write it at the end? after last statement?
Anyway SAS will do last statement at the end
@Ronein wrote:
As I understand from you when I wrote the code:
"If FIRST.ID and date=2017 then IND=1; else IND=0; "
then else Ind=0 apply on all rows and not only for first row of each ID.
I didn't know that and was sure that else will work also on first row of each ID.
The statement in the ELSE branch is executed whenever the IF condition is not met. The condition FIRST.ID and date=2017 cannot be met for observations after the first within an ID BY-group because FIRST.ID=0 for these observations and 0 is false.
Will it be correct (alternative equivalent way) to write:
""If FIRST.ID and date=2017 then do;
IND=1;
else IND=0;
end;
"
No, you can't put the ELSE statement into a DO-END block in the corresponding THEN branch. This would separate the "ELSE" from the corresponding "IF-THEN." An equivalent "long form" of my IF-THEN statement is:
if first.id then do;
if date=2017 then IndBase=1;
else IndBase=0;
end;
Is location of retain statement important?As I know I should write "Retain ID" before "If first.ID....."?
Can I also write it at the end? after last statement?
Variable ID (contained in dataset aaa) is retained by virtue of the SET statement, there's no need to retain it explicitly. In your code the only purpose of the RETAIN statement for variable IndBase is to instruct the compiler not to reset the value of IndBase to missing after each iteration of the DATA step. This is independent of the position of the (declarative) RETAIN statement in the DATA step. So, yes, you could put it before the RUN statement or in virtually any other place between the DATA statement and the RUN statement.
Sorry. I wanted to ask about location of statement "Retain IndBase"
Should it be before "first.Id" statement" or after ?
@Ronein wrote:
Sorry. I wanted to ask about location of statement "Retain IndBase"
Should it be before "first.Id" statement" or after ?
As mentioned, you may put it virtually anywhere. It's fine to leave it before the first IF statement, but you can move it after that statement as well (so that someone reading the code first sees the definition of IndBase in the assignment statement and then learns that this [apparently] new variable will be retained).
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.