Greetings,
I have the following dataset:
HAVE
id month sales sales2
1 1 1200 500
1 2 . .
1 3 1500 .
1 4 . .
2 1 . .
2 2 1000 300
2 3 . .
2 4 . .
WANT
id month sales sales2
1 1 1200 500
1 2 1200 500
1 3 1500 500
1 4 1500 500
2 1 . .
2 2 1000 300
2 3 1000 300
2 4 1000 300
How would I go about coding this? Ideally I want the missing for id=2 to be filled in as well, but does that violate LOCF?
data have;
input id sales;
datalines;
1 1200
1 .
1 1500
1 .
2 .
2 1000
2 .
2 .
2 .
2 .
2 3000
3 .
3 .
3 1100
3 .
3 .
;
data want;
set have;
by id;
retain _sales;
if first.id then call missing(_sales);
if sales then _sales=sales;
drop sales;
rename _sales=sales;
run;
I think that does the trick!
How does that differ from this code that I used?
Call missing: To avoid retaining the value from the previous by group
And retain: To hold the value across obs coz values created by assignment statement are reset to missing during each interation of the datastep
If you want to accomplish it directly during input, use a lag for id. e.g.:
data HAVE (drop=hold); input id month sales; retain hold; if id ne lag(id) then hold=sales; else if not missing(sales) then hold=sales; else sales=hold; cards; 1 1 1200 1 2 . 1 3 1500 1 4 . 2 1 . 2 2 1000 2 3 . 2 4 . 2 5 . 2 6 . 2 7 3000 3 1 . 3 2 . 3 3 1100 3 4 . 3 5 . ;
Art, CEO, AnalystFinder.com
Thanks both for the quick replies. Would it be possible to do this for multiple variables at a time or does it have to be done individually?
Sorry, what do you mean? Can you please clarify. I am little too slow to read between the lines
Basically I was saying that your call missing line is equivalent to 'if first.id then _sales=.;
I also do not understand why when I put 'if last.id;' that, that removes all of my observations for each id and leaves only one observation.
I was also wondering if you can do this same process for multiple variables at a time instead of individually?
Yes sorry. Your -call missing line is equivalent to 'if first.id then _sales=.; is right and absolute
also do not understand why when I put 'if last.id;' that, that removes all of my observations for each id and leaves only one observation.-- You don't need this subsetting if as you want to process and keep all obs
I was also wondering if you can do this same process for multiple variables at a time instead of individually?-- Yes, I think so if i understand you well enough.
How would I do it for multiple variables at a time? Can you give eaxample?
Ok, if i understand what you mean, you may have multiple by groups. This could be a cascading or like a parent child group-sub group relationship. You just process by groups using first and last automatic vars from the child to the outer most by group.
I am too lazy to write a program to show you a working example. May i request you to modify your req with what you have in mind in the question and I can work on the solution
I do apologise
Still no change sir:
data have;
input id month sales;
datalines;
1 1 1200
1 2 .
1 3 1500
1 4 .
2 1 .
2 2 1000
2 3 .
2 4 .
2 5 .
2 6 .
2 7 3000
3 1 .
3 2 .
3 3 1100
3 4 .
3 5 .
;
data want;
set have;
by id;
retain _sales;
if first.id then call missing(_sales);
if sales then _sales=sales;
drop sales;
rename _sales=sales;
run;
I edited the original.
Still exactly the same logic:
data have;
input id month sales sales2;
datalines;
1 1 1200 500
1 2 . .
1 3 1500 .
1 4 . .
2 1 . .
2 2 1000 300
2 3 . .
2 4 . .
;
data want;
set have;
by id;
retain _sales __sales;
if first.id then call missing(_sales,__sales);
if sales then _sales=sales;
if sales2 then __sales=sales2;
drop sales sales2;
rename _sales=sales __sales=sales2;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.