Hi All,
I have the dataset that looks like this:
Date Company Var
199601 1 .
199602 1 .
199603 1 .
199604 1 .
199605 1 .
199606 1 10
199607 1 .
199608 1 .
199609 1 .
199610 1 .
199611 1 .
199612 1 .
199701 1 .
199702 1 .
199703 1 .
199704 1 .
199705 1 .
199706 1 11
199707 1 .
199708 1 .
199709 1 .
199710 1 .
199711 1 .
199712 1 .
The required dataset should look like this:
Date Company Var
199601 1 .
199602 1 .
199603 1 .
199604 1 .
199605 1 .
199606 1 .
199607 1 10
199608 1 10
199609 1 10
199610 1 10
199611 1 10
199612 1 10
199701 1 10
199702 1 10
199703 1 10
199704 1 10
199705 1 10
199706 1 10
199707 1 11
199708 1 11
199709 1 11
199710 1 11
199711 1 11
199712 1 11
So what I need to do is for each company, to repeat the same value that is available in June, year t, for next months within a year until June, year t+1. Please note that I need to use Var value in June, but to start from July, year t, and repeat the value until June, year t+1. I have many months and many companies.
Any help will be hugely appriciated.
Kind regards,
Ruslan
I would recommend using the retain statement with a temporary variable. Simply put, any variables specified in the retain statement are not reset from one itteration of the data step to the next.
data outputData (drop=temp_var temp_company); /* Drops temp_var and temp_company after the data step has completely finished running */
set inputData;
retain temp_var . temp_company .; /* Initializes temp_var to missing */
/* When the company changes, reset temp_var to missing */
if temp_company ne company then do;
temp_company = company;
temp_var = .;
end;
/* Check to see if the variable Var is missing */
if Var = . then Var = temp_var; /* When missing, assign it to the value of temp_var */
else temp_var = Var; /* otherwise assign temp_var the value of Var */
run;
EDIT: I noticed that you also mentioned that this needs to work for multiple companies, I've changed my code to handle that by retaining the previous company and resetting temp_var when the company changes.
data inputData;
input Date Company Var;
datalines;
199601 1 .
199602 1 .
199603 1 .
199604 1 .
199605 1 .
199606 1 10
199607 1 .
199608 1 .
199609 1 .
199610 1 .
199611 1 .
199612 1 .
199701 1 .
199702 1 .
199703 1 .
199704 1 .
199705 1 .
199706 1 11
199707 1 .
199708 1 .
199709 1 .
199710 1 .
199711 1 .
199712 1 .
199601 2 .
199602 2 .
199603 2 .
199604 2 .
199605 2 .
199606 2 10
199607 2 .
199608 2 .
199609 2 .
199610 2 .
199611 2 .
199612 2 .
199701 2 .
199702 2 .
199703 2 .
199704 2 .
199705 2 .
199706 2 11
199707 2 .
199708 2 .
199709 2 .
199710 2 .
199711 2 .
199712 2 .
;
run;
data outputData (drop=temp_var temp_company); /* Drops temp_var and temp_company after the data step has completely finished running */
set inputData;
retain temp_var . temp_company .; /* Initializes temp_var to missing */
/* When the company changes, reset temp_var to missing */
if temp_company ne company then do;
temp_company = company;
temp_var = .;
end;
/* Check to see if the variable Var is missing */
if Var = . then Var = temp_var; /* When missing, assign it to the value of temp_var */
else temp_var = Var; /* otherwise assign temp_var the value of Var */
run;
I would recommend using the retain statement with a temporary variable. Simply put, any variables specified in the retain statement are not reset from one itteration of the data step to the next.
data outputData (drop=temp_var temp_company); /* Drops temp_var and temp_company after the data step has completely finished running */
set inputData;
retain temp_var . temp_company .; /* Initializes temp_var to missing */
/* When the company changes, reset temp_var to missing */
if temp_company ne company then do;
temp_company = company;
temp_var = .;
end;
/* Check to see if the variable Var is missing */
if Var = . then Var = temp_var; /* When missing, assign it to the value of temp_var */
else temp_var = Var; /* otherwise assign temp_var the value of Var */
run;
EDIT: I noticed that you also mentioned that this needs to work for multiple companies, I've changed my code to handle that by retaining the previous company and resetting temp_var when the company changes.
data inputData;
input Date Company Var;
datalines;
199601 1 .
199602 1 .
199603 1 .
199604 1 .
199605 1 .
199606 1 10
199607 1 .
199608 1 .
199609 1 .
199610 1 .
199611 1 .
199612 1 .
199701 1 .
199702 1 .
199703 1 .
199704 1 .
199705 1 .
199706 1 11
199707 1 .
199708 1 .
199709 1 .
199710 1 .
199711 1 .
199712 1 .
199601 2 .
199602 2 .
199603 2 .
199604 2 .
199605 2 .
199606 2 10
199607 2 .
199608 2 .
199609 2 .
199610 2 .
199611 2 .
199612 2 .
199701 2 .
199702 2 .
199703 2 .
199704 2 .
199705 2 .
199706 2 11
199707 2 .
199708 2 .
199709 2 .
199710 2 .
199711 2 .
199712 2 .
;
run;
data outputData (drop=temp_var temp_company); /* Drops temp_var and temp_company after the data step has completely finished running */
set inputData;
retain temp_var . temp_company .; /* Initializes temp_var to missing */
/* When the company changes, reset temp_var to missing */
if temp_company ne company then do;
temp_company = company;
temp_var = .;
end;
/* Check to see if the variable Var is missing */
if Var = . then Var = temp_var; /* When missing, assign it to the value of temp_var */
else temp_var = Var; /* otherwise assign temp_var the value of Var */
run;
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.