BookmarkSubscribeRSS Feed
Wafflecakes
Calcite | Level 5

Greetings,

 

I have the following dataset: 

 

HAVE

id month sales sales2

1   1   1200 500

1   2   .         .

1   3   1500  .

1   4   .         .

2   1   .         .

2   2   1000 300

2   3   .        .

2   4   .        .

 

 

 

WANT

id month sales sales2

1   1   1200 500

1   2   1200 500

1   3   1500 500

1   4   1500 500

2   1   .         .

2   2   1000 300

2   3   1000 300

2   4   1000 300

 

 

How would I go about coding this?  Ideally I want the missing for id=2 to be filled in as well, but does that violate LOCF?

25 REPLIES 25
novinosrin
Tourmaline | Level 20

data have;
input id sales;
datalines;
1 1200
1 .
1 1500
1 .
2 .
2 1000
2 .
2 .
2 .
2 .
2 3000
3 .
3 .
3 1100
3 .
3 .
;

data want;
set have;
by id;
retain _sales;
if first.id then call missing(_sales);
if sales then _sales=sales;
drop sales;
rename _sales=sales;
run;

 

Wafflecakes
Calcite | Level 5

I think that does the trick!

 

How does that differ from this code that I used?

 

data want; 
set have;
by id;
retain _sales;
if first.id then _sales=.;
if sales~=. then sales=_sales;
if last.id;
run; 
 
What is the purpose of the 'call missing' and retain keywords?
novinosrin
Tourmaline | Level 20

Call missing:   To avoid retaining the value from the previous by group

 

And retain: To hold the value across obs coz values created by assignment statement are reset to missing during each interation of the datastep

Wafflecakes
Calcite | Level 5
so in essence it is the same as 'if first.id then_sales=.;' that I used?
art297
Opal | Level 21

If you want to accomplish it directly during input, use a lag for id. e.g.:

data HAVE (drop=hold);
  input id month sales;
  retain hold;
  if id ne lag(id) then hold=sales;
  else if not missing(sales) then hold=sales;
  else sales=hold;
  cards;
1   1   1200
1   2   .
1   3   1500
1   4   .
2   1   . 
2   2   1000
2   3   .
2   4   .
2   5   .
2   6   .
2   7   3000
3   1   .
3   2   .
3   3   1100
3   4   .
3   5   .
;

Art, CEO, AnalystFinder.com

 

Wafflecakes
Calcite | Level 5

Thanks both for the quick replies.  Would it be possible to do this for multiple variables at a time or does it have to be done individually?

novinosrin
Tourmaline | Level 20

Sorry, what do you mean? Can you please clarify. I am little too slow to read between the lines

Wafflecakes
Calcite | Level 5

Basically I was saying that your call missing line is equivalent to 'if first.id then _sales=.;

 

I also do not understand why when I put 'if last.id;' that, that removes all of my observations for each id and leaves only one observation.

 

I was also wondering if you can do this same process for multiple variables at a time instead of individually?

novinosrin
Tourmaline | Level 20

Yes sorry. Your -call missing line is equivalent to 'if first.id then _sales=.; is right and absolute

 

also do not understand why when I put 'if last.id;' that, that removes all of my observations for each id and leaves only one observation.-- You don't need this subsetting if as you want to process and keep all obs

 

I was also wondering if you can do this same process for multiple variables at a time instead of individually?-- Yes, I think so if i understand you well enough. 

 

Wafflecakes
Calcite | Level 5

How would I do it for multiple variables at a time?  Can you give eaxample?

novinosrin
Tourmaline | Level 20

Ok, if i understand what you mean, you may have multiple by groups. This could be a cascading or like a parent child group-sub group relationship. You just process by groups using first and last automatic vars from the child to the outer most by group. 

I am too lazy to write a program to show you a working example. May i request you to modify your req with what you have in mind in the question and I can work on the solution

 

I do apologise

novinosrin
Tourmaline | Level 20

Still no change sir:

 

data have;

input id month sales;

datalines;

1   1   1200

1   2   .

1   3   1500

1   4   .

2   1   .

2   2   1000

2   3   .

2   4   .

2   5   .

2   6   .

2   7   3000

3   1   .

3   2   .

3   3   1100

3   4   .

3   5   .

;

 

data want;

set have;

by id;

retain _sales;

if first.id then call missing(_sales);

if sales then _sales=sales;

drop sales;

rename _sales=sales;

run;

Wafflecakes
Calcite | Level 5

I edited the original.

novinosrin
Tourmaline | Level 20

Still exactly the same logic:

 

data have;

input id month sales sales2;

datalines;

1   1   1200 500

1   2   .         .

1   3   1500  .

1   4   .         .

2   1   .         .

2   2   1000 300

2   3   .        .

2   4   .        .

;

 

data want;

set have;

by id;

retain _sales __sales;

if first.id then call missing(_sales,__sales);

if sales then _sales=sales;

if sales2 then __sales=sales2;

drop sales sales2;

rename _sales=sales __sales=sales2;

run;

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 25 replies
  • 4276 views
  • 1 like
  • 4 in conversation