Hello,
I have a panel data set with nearly 30 variables in which code indicating the stock code and year indicating the time period from 2009 to 2015. I wanna make some adjustment about it. 28 of them are dummy variable which only take the value of 0 and 1. Now i have to change the definition for the dummy variables and make new 28 variables. The new rule is that for one stock code, if the variable take the value of 1 in any of year before 2012 then it would be 1 for all years (from 2009 to 2015).
the data set like
data q1;
input a b c d ;
datalines;
20002 2009 0 1
20002 2010 1 0
20002 2013 0 1
20002 2014 0 0
30002 2009 0 1
30002 2010 0 0
30002 2013 1 1
30002 2014 1 0
;
run;
and I want to get a data set like that
I have tried the code below
data c4;
set c3;
if c = 1 and year < 2012;
run;
proc sort data = c4 nodupkey;
by a ;
run;
data c4;
set c4;
rename c = c_n;
run;
proc sql;
create table c5 as
select c3.*,c4.c_n
from c3 left join c4
on c3.a = c4.a;
quit;
data c3;
set c3;
if c_n = . then c_n = 0;
else c_n = c_n;
run;
It works, but I have to deal with 28 variables using the same code and I am wondering if I can get the data set without wirting those duplicate codes.
Thanks,
Jane
UNTESTED CODE
proc summary data=have(where=(year<2012)) nway;
class stock_code;
var var-var28;
output out=_max_ max=max1-max28;
run;
data want;
merge _max_ Have;
by stock_code;
array max max1-max28;
array var var1-var28;
do i=1 to 28;
if max(i)=1 and year>=2009 then var(i)=1;
end;
drop max1-max28;
run;
So, first of all, we would all benefit from meaningful variable names in you data set. So, instead of variable A, you could call it STOCK_CODE and instead of variable B you could call it YEAR. That would help greatly.
The new rule is that for one stock code, if the variable take the value of 1 in any of year before 2012 then it would be 1 for all years (from 2009 to 2015).
This doesn't appear to be what I see in the desired output data set. When A=20002, and B=2009, I see the value of 1 in the input data for D. Then, according to your rule, you want to change D to be equal to 1 for all years from 2009 to 2015, but that is not what i see in the output data set, so there would need to be additional explanation on this matter.
Eventually, you are probably going to need to use PROC SUMMARY to determine if there is a 1 in any year before 2012 for each of your 28 variables, and then a data step using arrays (not macros) to change the values. To do this, if you name your variables as CODE1 through CODE28 (or something like that), then your coding becomes much much simpler.
Thanks and I am sorry for the strange variable names and some typo mistakes.
I would change the example data like this;
data p1;
input stock_code year var1 var2 var3;
datalines;
20002 2009 1 0 1
20002 2010 0 1 0
20002 2011 0 0 0
20002 2013 0 0 0
20002 2014 0 0 0
20003 2009 0 0 0
20003 2010 0 0 0
20003 2013 1 1 1
20003 2014 1 1 1
;
run;
I have tried the code using array like this
data want;
set have;
array b(28) var1-var28;
array a(28) n_var1-n_var28;
do i = 1 to 28;
if b(i) = 1 and year < 2012 then a(i) = 1;
else a(i) = 0;
end;
run;
seems I have made some mistake with the if conditions.
and I want to get the data set like that:
UNTESTED CODE
proc summary data=have(where=(year<2012)) nway;
class stock_code;
var var-var28;
output out=_max_ max=max1-max28;
run;
data want;
merge _max_ Have;
by stock_code;
array max max1-max28;
array var var1-var28;
do i=1 to 28;
if max(i)=1 and year>=2009 then var(i)=1;
end;
drop max1-max28;
run;
Let's try something similar to what you already programmed:
data want;
array b {28} var1-var28;
array a {28} n_var1-n_var28;
do k=1 to 28;
a{k}=0;
end;
do until (last.stock_code);
set have;
by stock_code;
if year < 2012 then do k=1 to 28;
if b{k}=1 then a{k}=1;
end;
end;
do until (last.stock_code);
set have;
by stock_code;
output;
end;
run;
The top loop sets the new variables. Then the bottom loop reads all observations for the stock codes and outputs them (along with the new variables).
Thanks for your code and explanation, I will try it latter : )
Hi:
Also confusing is why you are creating table c5 in the PROC SQL step, but then are using C3 in the final DATA step.
Cynthia
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.