BookmarkSubscribeRSS Feed
Wafflecakes
Calcite | Level 5

Greetings,

 

I have the following dataset: 

 

HAVE

id month sales sales2

1   1   1200 500

1   2   .         .

1   3   1500  .

1   4   .         .

2   1   .         .

2   2   1000 300

2   3   .        .

2   4   .        .

 

 

 

WANT

id month sales sales2

1   1   1200 500

1   2   1200 500

1   3   1500 500

1   4   1500 500

2   1   .         .

2   2   1000 300

2   3   1000 300

2   4   1000 300

 

 

How would I go about coding this?  Ideally I want the missing for id=2 to be filled in as well, but does that violate LOCF?

25 REPLIES 25
novinosrin
Tourmaline | Level 20

data have;
input id sales;
datalines;
1 1200
1 .
1 1500
1 .
2 .
2 1000
2 .
2 .
2 .
2 .
2 3000
3 .
3 .
3 1100
3 .
3 .
;

data want;
set have;
by id;
retain _sales;
if first.id then call missing(_sales);
if sales then _sales=sales;
drop sales;
rename _sales=sales;
run;

 

Wafflecakes
Calcite | Level 5

I think that does the trick!

 

How does that differ from this code that I used?

 

data want; 
set have;
by id;
retain _sales;
if first.id then _sales=.;
if sales~=. then sales=_sales;
if last.id;
run; 
 
What is the purpose of the 'call missing' and retain keywords?
novinosrin
Tourmaline | Level 20

Call missing:   To avoid retaining the value from the previous by group

 

And retain: To hold the value across obs coz values created by assignment statement are reset to missing during each interation of the datastep

Wafflecakes
Calcite | Level 5
so in essence it is the same as 'if first.id then_sales=.;' that I used?
art297
Opal | Level 21

If you want to accomplish it directly during input, use a lag for id. e.g.:

data HAVE (drop=hold);
  input id month sales;
  retain hold;
  if id ne lag(id) then hold=sales;
  else if not missing(sales) then hold=sales;
  else sales=hold;
  cards;
1   1   1200
1   2   .
1   3   1500
1   4   .
2   1   . 
2   2   1000
2   3   .
2   4   .
2   5   .
2   6   .
2   7   3000
3   1   .
3   2   .
3   3   1100
3   4   .
3   5   .
;

Art, CEO, AnalystFinder.com

 

Wafflecakes
Calcite | Level 5

Thanks both for the quick replies.  Would it be possible to do this for multiple variables at a time or does it have to be done individually?

novinosrin
Tourmaline | Level 20

Sorry, what do you mean? Can you please clarify. I am little too slow to read between the lines

Wafflecakes
Calcite | Level 5

Basically I was saying that your call missing line is equivalent to 'if first.id then _sales=.;

 

I also do not understand why when I put 'if last.id;' that, that removes all of my observations for each id and leaves only one observation.

 

I was also wondering if you can do this same process for multiple variables at a time instead of individually?

novinosrin
Tourmaline | Level 20

Yes sorry. Your -call missing line is equivalent to 'if first.id then _sales=.; is right and absolute

 

also do not understand why when I put 'if last.id;' that, that removes all of my observations for each id and leaves only one observation.-- You don't need this subsetting if as you want to process and keep all obs

 

I was also wondering if you can do this same process for multiple variables at a time instead of individually?-- Yes, I think so if i understand you well enough. 

 

Wafflecakes
Calcite | Level 5

How would I do it for multiple variables at a time?  Can you give eaxample?

novinosrin
Tourmaline | Level 20

Ok, if i understand what you mean, you may have multiple by groups. This could be a cascading or like a parent child group-sub group relationship. You just process by groups using first and last automatic vars from the child to the outer most by group. 

I am too lazy to write a program to show you a working example. May i request you to modify your req with what you have in mind in the question and I can work on the solution

 

I do apologise

novinosrin
Tourmaline | Level 20

Still no change sir:

 

data have;

input id month sales;

datalines;

1   1   1200

1   2   .

1   3   1500

1   4   .

2   1   .

2   2   1000

2   3   .

2   4   .

2   5   .

2   6   .

2   7   3000

3   1   .

3   2   .

3   3   1100

3   4   .

3   5   .

;

 

data want;

set have;

by id;

retain _sales;

if first.id then call missing(_sales);

if sales then _sales=sales;

drop sales;

rename _sales=sales;

run;

Wafflecakes
Calcite | Level 5

I edited the original.

novinosrin
Tourmaline | Level 20

Still exactly the same logic:

 

data have;

input id month sales sales2;

datalines;

1   1   1200 500

1   2   .         .

1   3   1500  .

1   4   .         .

2   1   .         .

2   2   1000 300

2   3   .        .

2   4   .        .

;

 

data want;

set have;

by id;

retain _sales __sales;

if first.id then call missing(_sales,__sales);

if sales then _sales=sales;

if sales2 then __sales=sales2;

drop sales sales2;

rename _sales=sales __sales=sales2;

run;

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 25 replies
  • 5832 views
  • 1 like
  • 4 in conversation