BookmarkSubscribeRSS Feed
Ronein
Meteorite | Level 14

Hello

For each customer (ID) there are multiple rows.

Target: have one row for each customer with the original raw data in the last raw  and create a new field called "Ind" that get value 1 if year in first row for customer=2017 and 0 otherwise.

I expect to get following output table:

1   2019   1  0
2   2019   0  1
3   2019   1  0

(for ID=1 year in first row is not 2017 so Ind=0

for ID=2 year in first row is  2017 so Ind=1

for ID=3 year in first row is  2017 so Ind=1)

Why in my code I don't get the desired output??

   data aaa;
input id date status;
cards;
1 2018 1
1 2019 1
2 2017 0
2 2018 0
2 2019 0
3 2017 0
3 2018 0
3 2019 1
;
run;


data bbb;
set aaa;
by id;
retain IndBase;
if first.id and date=2017 then IndBase=1;
else IndBase=0;
if last.id then output;
run;

 

12 REPLIES 12
PeterClemmensen
Tourmaline | Level 20

@Ronein please post your data in a usable form. The data step with aaa yields an error.

Patrick
Opal | Level 21

@Ronein 

The data step you've posted is not working and the data bit under CARDS doesn't match your input statement (so couldn't fix it for you).

Please post a working and tested SAS data step which creates useful sample data.

Ronein
Meteorite | Level 14
I am not near computer now but i don't understand whst is wrong.Maybe change namneof field date to ddate.
Data is very simple and contain 3 fields.
Ronein
Meteorite | Level 14

I am not near computer now but i don't understand whst is wrong.Maybe change name of field date to ddate.
Data is very simple and contain 3 fields.
PeterClemmensen
Tourmaline | Level 20

There you go. Btw your posted result data is different from the description.

 

data aaa;
input id date status;
cards;
1 2018 1
1 2019 1
2 2017 0
2 2018 0
2 2019 0
3 2017 0
3 2018 0
3 2019 1
;

data bbb;
    set aaa;
    by id;
    if first.id then ind=ifn(date=2017, 1, 0);
    if last.id;
    retain ind;
run;

Result:

 

id  date    status  ind
1	2019	1	    0
2	2019	0	    1
3	2019	1	    1
Patrick
Opal | Level 21

@Ronein wrote:

I am not near computer now but i don't understand whst is wrong.Maybe change name of field date to ddate.
Data is very simple and contain 3 fields.

Look at what you've posted. The code is all on one line. But even if I'm nice and split this over multiple lines I only get data for the first two columns (except for the first row; and the last row of data also looks wrong).

data aaa;
  input id date status;
  cards;
1 2018 11 
2019 12 
2017 02 
2018 02 
2019 03 
2017 03 
2018 03 
2019 1
;
run;

Kudos for @PeterClemmensen who apparently was able to decipher what you've posted

PeterClemmensen
Tourmaline | Level 20

@Patrick, One of my greater achievements 😉

FreelanceReinh
Jade | Level 19

Hello @Ronein,


@Ronein wrote:

Hello

For each customer (ID) there are multiple rows.

Target: have one row for each customer with the original raw data in the last raw  and create a new field called "Ind" that get value 1 if year in first row for customer=2017 and 0 otherwise.

I expect to get following output table:

1   2019   1  0
2   2019   0  1
3   2019   1  0

(for ID=1 year in first row is not 2017 so Ind=0

for ID=2 year in first row is  2017 so Ind=1

for ID=3 year in first row is  2017 so Ind=1)

Why in my code I don't get the desired output??

   data aaa;
input id date status;
cards;
1 2018 1
1 2019 1
2 2017 0
2 2018 0
2 2019 0
3 2017 0
3 2018 0
3 2019 1
;
run;


data bbb;
set aaa;
by id;
retain IndBase;
if first.id and date=2017 then IndBase=1;
else IndBase=0;
if last.id then output;
run;

[It's interesting that the "one-line code" is displayed nicely when you quote it. EDIT: But only while you're editing!]

 

A logical error causes the incorrect results: The statement else IndBase=0; is executed for the second, third, ..., last observation of each ID (because the first.id criterion is not met) and thus overwrites the value of interest from the first observation of the ID. Just change the IF/THEN-ELSE statement to

if first.id then IndBase=(date=2017);

so that only one assignment statement for IndBase is executed per BY group.

 

(Note that the Ind value for ID=3 in your sample output does not match the description.)

Ronein
Meteorite | Level 14

Perfect and thank you so much.

As I understand from you when I wrote the code:

"If FIRST.ID and date=2017 then IND=1; else IND=0; "

then  else Ind=0 apply on all rows and not only for first row of each ID.

I didn't know that and was sure that else will work also on first row of each ID.

 

 

I saw your perfect solution to write:

"IF First.ID then Ind=(date=2017);"

 

Will it be correct  (alternative equivalent way) to write:

""If FIRST.ID and date=2017 then do;

IND=1;

else IND=0;

end;

"

 

One more question please:

Is location of retain statement important?

As I know I should write  "Retain ID"    before   "If  first.ID....."?

Can I also write it at the end? after last statement? 

Anyway SAS will do last statement at the end 

 

 

 

FreelanceReinh
Jade | Level 19

@Ronein wrote:

As I understand from you when I wrote the code:

"If FIRST.ID and date=2017 then IND=1; else IND=0; "

then  else Ind=0 apply on all rows and not only for first row of each ID.

I didn't know that and was sure that else will work also on first row of each ID.


The statement in the ELSE branch is executed whenever the IF condition is not met. The condition FIRST.ID and date=2017 cannot be met for observations after the first within an ID BY-group because FIRST.ID=0 for these observations and 0 is false.


Will it be correct  (alternative equivalent way) to write:

""If FIRST.ID and date=2017 then do;

IND=1;

else IND=0;

end;

"


No, you can't put the ELSE statement into a DO-END block in the corresponding THEN branch. This would separate the "ELSE" from the corresponding "IF-THEN." An equivalent "long form" of my IF-THEN statement is:

if first.id then do;
  if date=2017 then IndBase=1;
  else IndBase=0;
end;

Is location of retain statement important?

As I know I should write  "Retain ID"    before   "If  first.ID....."?

Can I also write it at the end? after last statement? 


Variable ID (contained in dataset aaa) is retained by virtue of the SET statement, there's no need to retain it explicitly. In your code the only purpose of the RETAIN statement for variable IndBase is to instruct the compiler not to reset the value of IndBase to missing after each iteration of the DATA step. This is independent of the position of the (declarative) RETAIN statement in the DATA step. So, yes, you could put it before the RUN statement or in virtually any other place between the DATA statement and the RUN statement.

Ronein
Meteorite | Level 14

Sorry. I wanted to ask about location of statement  "Retain IndBase"

Should it be before "first.Id" statement" or after ?

 

FreelanceReinh
Jade | Level 19

@Ronein wrote:

Sorry. I wanted to ask about location of statement  "Retain IndBase"

Should it be before "first.Id" statement" or after ?

 


As mentioned, you may put it virtually anywhere. It's fine to leave it before the first IF statement, but you can move it after that statement as well (so that someone reading the code first sees the definition of IndBase in the assignment statement and then learns that this [apparently] new variable will be retained).

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 12 replies
  • 1429 views
  • 8 likes
  • 4 in conversation