DATA Step, Macro, Functions and more

LOCF

Reply
Contributor
Posts: 43

LOCF

[ Edited ]

Greetings,

 

I have the following dataset: 

 

HAVE

id month sales sales2

1   1   1200 500

1   2   .         .

1   3   1500  .

1   4   .         .

2   1   .         .

2   2   1000 300

2   3   .        .

2   4   .        .

 

 

 

WANT

id month sales sales2

1   1   1200 500

1   2   1200 500

1   3   1500 500

1   4   1500 500

2   1   .         .

2   2   1000 300

2   3   1000 300

2   4   1000 300

 

 

How would I go about coding this?  Ideally I want the missing for id=2 to be filled in as well, but does that violate LOCF?

PROC Star
Posts: 283

Re: LOCF

Posted in reply to Wafflecakes

data have;
input id sales;
datalines;
1 1200
1 .
1 1500
1 .
2 .
2 1000
2 .
2 .
2 .
2 .
2 3000
3 .
3 .
3 1100
3 .
3 .
;

data want;
set have;
by id;
retain _sales;
if first.id then call missing(_sales);
if sales then _sales=sales;
drop sales;
rename _sales=sales;
run;

 

Contributor
Posts: 43

Re: LOCF

[ Edited ]
Posted in reply to novinosrin

I think that does the trick!

 

How does that differ from this code that I used?

 

data want; 
set have;
by id;
retain _sales;
if first.id then _sales=.;
if sales~=. then sales=_sales;
if last.id;
run; 
 
What is the purpose of the 'call missing' and retain keywords?
PROC Star
Posts: 283

Re: LOCF

[ Edited ]
Posted in reply to Wafflecakes

Call missing:   To avoid retaining the value from the previous by group

 

And retain: To hold the value across obs coz values created by assignment statement are reset to missing during each interation of the datastep

Contributor
Posts: 43

Re: LOCF

Posted in reply to novinosrin
so in essence it is the same as 'if first.id then_sales=.;' that I used?
PROC Star
Posts: 7,474

Re: LOCF

Posted in reply to Wafflecakes

If you want to accomplish it directly during input, use a lag for id. e.g.:

data HAVE (drop=hold);
  input id month sales;
  retain hold;
  if id ne lag(id) then hold=sales;
  else if not missing(sales) then hold=sales;
  else sales=hold;
  cards;
1   1   1200
1   2   .
1   3   1500
1   4   .
2   1   . 
2   2   1000
2   3   .
2   4   .
2   5   .
2   6   .
2   7   3000
3   1   .
3   2   .
3   3   1100
3   4   .
3   5   .
;

Art, CEO, AnalystFinder.com

 

Contributor
Posts: 43

Re: LOCF

Thanks both for the quick replies.  Would it be possible to do this for multiple variables at a time or does it have to be done individually?

PROC Star
Posts: 283

Re: LOCF

Posted in reply to Wafflecakes

Sorry, what do you mean? Can you please clarify. I am little too slow to read between the lines

Contributor
Posts: 43

Re: LOCF

Posted in reply to novinosrin

Basically I was saying that your call missing line is equivalent to 'if first.id then _sales=.;

 

I also do not understand why when I put 'if last.id;' that, that removes all of my observations for each id and leaves only one observation.

 

I was also wondering if you can do this same process for multiple variables at a time instead of individually?

PROC Star
Posts: 283

Re: LOCF

Posted in reply to Wafflecakes

Yes sorry. Your -call missing line is equivalent to 'if first.id then _sales=.; is right and absolute

 

also do not understand why when I put 'if last.id;' that, that removes all of my observations for each id and leaves only one observation.-- You don't need this subsetting if as you want to process and keep all obs

 

I was also wondering if you can do this same process for multiple variables at a time instead of individually?-- Yes, I think so if i understand you well enough. 

 

Contributor
Posts: 43

Re: LOCF

Posted in reply to novinosrin

How would I do it for multiple variables at a time?  Can you give eaxample?

PROC Star
Posts: 283

Re: LOCF

[ Edited ]
Posted in reply to Wafflecakes

Ok, if i understand what you mean, you may have multiple by groups. This could be a cascading or like a parent child group-sub group relationship. You just process by groups using first and last automatic vars from the child to the outer most by group. 

I am too lazy to write a program to show you a working example. May i request you to modify your req with what you have in mind in the question and I can work on the solution

 

I do apologise

PROC Star
Posts: 283

Re: LOCF

Posted in reply to novinosrin

Still no change sir:

 

data have;

input id month sales;

datalines;

1   1   1200

1   2   .

1   3   1500

1   4   .

2   1   .

2   2   1000

2   3   .

2   4   .

2   5   .

2   6   .

2   7   3000

3   1   .

3   2   .

3   3   1100

3   4   .

3   5   .

;

 

data want;

set have;

by id;

retain _sales;

if first.id then call missing(_sales);

if sales then _sales=sales;

drop sales;

rename _sales=sales;

run;

Contributor
Posts: 43

Re: LOCF

Posted in reply to novinosrin

I edited the original.

PROC Star
Posts: 283

Re: LOCF

Posted in reply to Wafflecakes

Still exactly the same logic:

 

data have;

input id month sales sales2;

datalines;

1   1   1200 500

1   2   .         .

1   3   1500  .

1   4   .         .

2   1   .         .

2   2   1000 300

2   3   .        .

2   4   .        .

;

 

data want;

set have;

by id;

retain _sales __sales;

if first.id then call missing(_sales,__sales);

if sales then _sales=sales;

if sales2 then __sales=sales2;

drop sales sales2;

rename _sales=sales __sales=sales2;

run;

Ask a Question
Discussion stats
  • 25 replies
  • 354 views
  • 0 likes
  • 4 in conversation