Hi All,
I have the dataset that looks like this:
Date Company Var
199601 1 .
199602 1 .
199603 1 .
199604 1 .
199605 1 .
199606 1 10
199607 1 .
199608 1 .
199609 1 .
199610 1 .
199611 1 .
199612 1 .
199701 1 .
199702 1 .
199703 1 .
199704 1 .
199705 1 .
199706 1 11
199707 1 .
199708 1 .
199709 1 .
199710 1 .
199711 1 .
199712 1 .
The required dataset should look like this:
Date Company Var
199601 1 .
199602 1 .
199603 1 .
199604 1 .
199605 1 .
199606 1 .
199607 1 10
199608 1 10
199609 1 10
199610 1 10
199611 1 10
199612 1 10
199701 1 10
199702 1 10
199703 1 10
199704 1 10
199705 1 10
199706 1 10
199707 1 11
199708 1 11
199709 1 11
199710 1 11
199711 1 11
199712 1 11
So what I need to do is for each company, to repeat the same value that is available in June, year t, for next months within a year until June, year t+1. Please note that I need to use Var value in June, but to start from July, year t, and repeat the value until June, year t+1. I have many months and many companies.
Any help will be hugely appriciated.
Kind regards,
Ruslan
I would recommend using the retain statement with a temporary variable. Simply put, any variables specified in the retain statement are not reset from one itteration of the data step to the next.
data outputData (drop=temp_var temp_company); /* Drops temp_var and temp_company after the data step has completely finished running */
set inputData;
retain temp_var . temp_company .; /* Initializes temp_var to missing */
/* When the company changes, reset temp_var to missing */
if temp_company ne company then do;
temp_company = company;
temp_var = .;
end;
/* Check to see if the variable Var is missing */
if Var = . then Var = temp_var; /* When missing, assign it to the value of temp_var */
else temp_var = Var; /* otherwise assign temp_var the value of Var */
run;
EDIT: I noticed that you also mentioned that this needs to work for multiple companies, I've changed my code to handle that by retaining the previous company and resetting temp_var when the company changes.
data inputData;
input Date Company Var;
datalines;
199601 1 .
199602 1 .
199603 1 .
199604 1 .
199605 1 .
199606 1 10
199607 1 .
199608 1 .
199609 1 .
199610 1 .
199611 1 .
199612 1 .
199701 1 .
199702 1 .
199703 1 .
199704 1 .
199705 1 .
199706 1 11
199707 1 .
199708 1 .
199709 1 .
199710 1 .
199711 1 .
199712 1 .
199601 2 .
199602 2 .
199603 2 .
199604 2 .
199605 2 .
199606 2 10
199607 2 .
199608 2 .
199609 2 .
199610 2 .
199611 2 .
199612 2 .
199701 2 .
199702 2 .
199703 2 .
199704 2 .
199705 2 .
199706 2 11
199707 2 .
199708 2 .
199709 2 .
199710 2 .
199711 2 .
199712 2 .
;
run;
data outputData (drop=temp_var temp_company); /* Drops temp_var and temp_company after the data step has completely finished running */
set inputData;
retain temp_var . temp_company .; /* Initializes temp_var to missing */
/* When the company changes, reset temp_var to missing */
if temp_company ne company then do;
temp_company = company;
temp_var = .;
end;
/* Check to see if the variable Var is missing */
if Var = . then Var = temp_var; /* When missing, assign it to the value of temp_var */
else temp_var = Var; /* otherwise assign temp_var the value of Var */
run;
I would recommend using the retain statement with a temporary variable. Simply put, any variables specified in the retain statement are not reset from one itteration of the data step to the next.
data outputData (drop=temp_var temp_company); /* Drops temp_var and temp_company after the data step has completely finished running */
set inputData;
retain temp_var . temp_company .; /* Initializes temp_var to missing */
/* When the company changes, reset temp_var to missing */
if temp_company ne company then do;
temp_company = company;
temp_var = .;
end;
/* Check to see if the variable Var is missing */
if Var = . then Var = temp_var; /* When missing, assign it to the value of temp_var */
else temp_var = Var; /* otherwise assign temp_var the value of Var */
run;
EDIT: I noticed that you also mentioned that this needs to work for multiple companies, I've changed my code to handle that by retaining the previous company and resetting temp_var when the company changes.
data inputData;
input Date Company Var;
datalines;
199601 1 .
199602 1 .
199603 1 .
199604 1 .
199605 1 .
199606 1 10
199607 1 .
199608 1 .
199609 1 .
199610 1 .
199611 1 .
199612 1 .
199701 1 .
199702 1 .
199703 1 .
199704 1 .
199705 1 .
199706 1 11
199707 1 .
199708 1 .
199709 1 .
199710 1 .
199711 1 .
199712 1 .
199601 2 .
199602 2 .
199603 2 .
199604 2 .
199605 2 .
199606 2 10
199607 2 .
199608 2 .
199609 2 .
199610 2 .
199611 2 .
199612 2 .
199701 2 .
199702 2 .
199703 2 .
199704 2 .
199705 2 .
199706 2 11
199707 2 .
199708 2 .
199709 2 .
199710 2 .
199711 2 .
199712 2 .
;
run;
data outputData (drop=temp_var temp_company); /* Drops temp_var and temp_company after the data step has completely finished running */
set inputData;
retain temp_var . temp_company .; /* Initializes temp_var to missing */
/* When the company changes, reset temp_var to missing */
if temp_company ne company then do;
temp_company = company;
temp_var = .;
end;
/* Check to see if the variable Var is missing */
if Var = . then Var = temp_var; /* When missing, assign it to the value of temp_var */
else temp_var = Var; /* otherwise assign temp_var the value of Var */
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.