As suggested by Reeza, I am trying to reorganize my data into a long structure opposed to wide to reduce processing time as well as make the coding faster.
Currently, the data is organized a structure where it has main variables such as Recreation, Financial Plan, Legal, Housing, Gift, Education. However, these variables were created in a way that works like Pre_ResYr6thM_Recre and so forth.
I am looking for some sample code to pull the "Pre_ResYrNthM" part from each variable to make a several new variables for the Month, Quarter, Half Year and Year as well as variables for Recreation.
I have tried something to the effect of :
data Org;
set infl.arc_inflation_jb_102618;
if Preresyr1stM: then Month = 1;
else if Preresyr2ndM: then Month = 2;
else if Preresyr3rdM: then Month = 3;
else if Preresyr4thM: then Month =4;
else if Preresyr5thM: then Month =5;
else if Preresyr6thM: then Month =6;
else if Preresyr7thM: then Month =7;
else if Preresyr8thM: then Month =8;
else if Preresyr9thM: then Month =9;
else if Preresyr10thM: then Month =10;
else if Preresyr11thM: then Month =11;
else if Preresyr12thM: then Month =12;
run;
I was then going to just keep the main variables and have a structure like
StudyNo PID Day_1826 Day_InitialIW Month Quarter Half_year PreRes_Consume PreRes_COnsume_Total PostRes_Consume
PostRes_Consume_Total PreRes_Durable PreRes_Durable_Total
1 1 30Dec1992 05Oct1993 1 1 1 600 600
1 1 30Dec1992 05Oct1993 2 1 1 600 1200
I was hoping to find some sample code or an old forum for the code but not sure how to pull main character strings from the variables to create the new variables
I also tried an iteration that looked like this:
data Org;
set infl.arc_inflation_jb_102618;
if var like %Preresyr1stM% then Month = 1;
.
.
.
.
run;
Actually what you should do there is an array approach.
data want;
set have;
array _month(12) <list of variables in order>;
do i=1 to dim(_month);
interval='Month';
Period = i;
Value = _month(i);
output;
end;
run;
could you try something using the substr function?
here would be a start:
var2 = substr(var1, 9, length(var1)-12)
the 9th character is where the month begins... and then you want to count for the length of your variable (var1 in this case) minus the length of the unnecessary characters (the first 9 and the last3). i hope this helps give you some direction.
that is cool, i need to remember this for the future.
Also, this is really cool. Good tip and bookmarked.
Actually what you should do there is an array approach.
data want;
set have;
array _month(12) <list of variables in order>;
do i=1 to dim(_month);
interval='Month';
Period = i;
Value = _month(i);
output;
end;
run;
This seems to be working well, except I have a few questions.
First, the first month value is being doubled and the last is being taken away. Could this be because the DO Loop is stopping at 11 and not 12?
Secondly, when I run a second array, how would I keep the previous column while generating a new one?
Obviously, I have never really used arrays. I appreciate all your help, I am trying to learn as much as possible in a short time and you've been very patient.
data Tall;
set infl.arc_inflation_jb_102618;
array _month(12) N_PreResYr1stM_House
N_PreResYr2ndM_House
N_PreResYr3rdM_House
N_PreResYr4thM_House
N_PreResYr5thM_House
N_PreResYr6thM_House
N_PreResYr7thM_House
N_PreResYr8thM_House
N_PreResYr9thM_House
N_PreResYr10thM_House
N_PreResYr11thM_House
N_PreResYr12thM_House
;
do i=1 to dim(_month);
interval='Month';
Month = i;
PreResYr_House = _month(i);
output;
end;
run;
data Tall;
set infl.arc_inflation_jb_102618;
array _month(12) N_PreResYr1stM_CONSUME
N_PreResYr2ndM_CONSUME
N_PreResYr3rdM_CONSUME
N_PreResYr4thM_CONSUME
N_PreResYr5thM_CONSUME
N_PreResYr6thM_CONSUME
N_PreResYr7thM_CONSUME
N_PreResYr8thM_CONSUME
N_PreResYr9thM_CONSUME
N_PreResYr10thM_CONSUME
N_PreResYr11thM_CONSUME
N_PreResYr12thM_CONSUME;
do i=1 to dim(_month);
interval='Month';
Month = i;
PreResYr_CONSUME = _month(i);
output;
end;
run;
This code generates just the PreResYr_Consume without the PreResYr_House and it also doubles the first month value as seen below:
i Interval Month PreResYr_Consume
1 Month 1 767.101074
2 Month 2 383.550537
3 Month 3 383.550537
4 Month 4 383.550537
5 Month 5 383.550537
6 Month 6 383.550537
7 Month 7 383.550537
8 Month 8 383.550537
9 Month 9 383.550537
10 Month 10 383.550537
11 Month 11 383.550537
12 Month 12 .
You can have multiple arrays and steps in one data step. In fact, I think you'll have to here otherwise you're going to get weird results.
data Tall;
set infl.arc_inflation_jb_102618;
array _month(12) N_PreResYr1stM_House
N_PreResYr2ndM_House
N_PreResYr3rdM_House
N_PreResYr4thM_House
N_PreResYr5thM_House
N_PreResYr6thM_House
N_PreResYr7thM_House
N_PreResYr8thM_House
N_PreResYr9thM_House
N_PreResYr10thM_House
N_PreResYr11thM_House
N_PreResYr12thM_House
;
do i=1 to dim(_month);
interval='Month';
Month = i;
PreResYr_House = _month(i);
output;
end;
array _month(12) N_PreResYr1stM_CONSUME
N_PreResYr2ndM_CONSUME
N_PreResYr3rdM_CONSUME
N_PreResYr4thM_CONSUME
N_PreResYr5thM_CONSUME
N_PreResYr6thM_CONSUME
N_PreResYr7thM_CONSUME
N_PreResYr8thM_CONSUME
N_PreResYr9thM_CONSUME
N_PreResYr10thM_CONSUME
N_PreResYr11thM_CONSUME
N_PreResYr12thM_CONSUME;
do i=1 to dim(_month);
interval='Month';
Month = i;
PreResYr_CONSUME = _month(i);
output;
end;
run;
@joebacon wrote:
This seems to be working well, except I have a few questions.
First, the first month value is being doubled and the last is being taken away. Could this be because the DO Loop is stopping at 11 and not 12?
Secondly, when I run a second array, how would I keep the previous column while generating a new one?
Obviously, I have never really used arrays. I appreciate all your help, I am trying to learn as much as possible in a short time and you've been very patient.
data Tall; set infl.arc_inflation_jb_102618; array _month(12) N_PreResYr1stM_House N_PreResYr2ndM_House N_PreResYr3rdM_House N_PreResYr4thM_House N_PreResYr5thM_House N_PreResYr6thM_House N_PreResYr7thM_House N_PreResYr8thM_House N_PreResYr9thM_House N_PreResYr10thM_House N_PreResYr11thM_House N_PreResYr12thM_House ; do i=1 to dim(_month); interval='Month'; Month = i; PreResYr_House = _month(i); output; end; run; data Tall; set infl.arc_inflation_jb_102618; array _month(12) N_PreResYr1stM_CONSUME N_PreResYr2ndM_CONSUME N_PreResYr3rdM_CONSUME N_PreResYr4thM_CONSUME N_PreResYr5thM_CONSUME N_PreResYr6thM_CONSUME N_PreResYr7thM_CONSUME N_PreResYr8thM_CONSUME N_PreResYr9thM_CONSUME N_PreResYr10thM_CONSUME N_PreResYr11thM_CONSUME N_PreResYr12thM_CONSUME; do i=1 to dim(_month); interval='Month'; Month = i; PreResYr_CONSUME = _month(i); output; end; run;
This code generates just the PreResYr_Consume without the PreResYr_House and it also doubles the first month value as seen below:
i Interval Month PreResYr_Consume
1 Month 1 767.101074
2 Month 2 383.550537
3 Month 3 383.550537
4 Month 4 383.550537
5 Month 5 383.550537
6 Month 6 383.550537
7 Month 7 383.550537
8 Month 8 383.550537
9 Month 9 383.550537
10 Month 10 383.550537
11 Month 11 383.550537
12 Month 12 .
I then get the error that _month has already been defined.
44 end;
45
46
47 array _month(12) N_PreResYr1stM_CONSUME
______
124
ERROR 124-185: The variable _month has already been defined.
However, I want the months to stay but to input the value for each month under the new Consume category as well.
I'm a dummy. I just need to use the months and create the new variables in an If when statement. Sorry.
Reeza, I apologize for all the questions, but could you point me to where my Macro went wrong?
%Macro Transform(x);
When(&x.) do;
PreResYr_Consume =N_PreResYr&x.thM_CONSUME;
PreResYr_Durable =N_PreResYr&x.thM_DURABLE;
PreResYr_Trans =N_PreResYr&x.thM_TRANS;
PreResYr_Health =N_PreResYr&x.thM_HEALTH;
PreResYr_Insure =N_PreResYr&x.thM_INSURE;
PreResYr_Loan =N_PreResYr&x.thM_LOAN;
PreResYr_Tax =N_PreResYr&x.thM_TAX;
PreResYr_Educa =N_PreResYr&x.thM_EDUCA;
PreResYr_Gift =N_PreResYr&x.thM_GIFT;
PreResYr_Enter =N_PreResYr&x.thM_ENTER;
PreResYr_Recre =N_PreResYr&x.thM_RECRE;
PreResYr_Finplan =N_PreResYr&x.thM_FINPLAN;
PreResYr_Legala =N_PreResYr&x.thM_LEGALA;
PreResYr_IncomeTL =N_PreResYr&x.thM_INCOMETL;
end;
%Mend Transform;
data Tall2;
set Tall;
Select(Month);
When(1) do;
PreResYr_Consume =N_PreResYr1stM_CONSUME;
PreResYr_Durable =N_PreResYr1stM_DURABLE;
PreResYr_Trans =N_PreResYr1stM_TRANS;
PreResYr_Health =N_PreResYr1stM_HEALTH;
PreResYr_Insure =N_PreResYr1stM_INSURE;
PreResYr_Loan =N_PreResYr1stM_LOAN;
PreResYr_Tax =N_PreResYr1stM_TAX;
PreResYr_Educa =N_PreResYr1stM_EDUCA;
PreResYr_Gift =N_PreResYr1stM_GIFT;
PreResYr_Enter =N_PreResYr1stM_ENTER;
PreResYr_Recre =N_PreResYr1stM_RECRE;
PreResYr_Finplan =N_PreResYr1stM_FINPLAN;
PreResYr_Legala =N_PreResYr1stM_LEGALA;
PreResYr_IncomeTL =N_PreResYr1stM_INCOMETL;
end;
When(2) do;
PreResYr_Consume =N_PreResYr2ndM_CONSUME;
PreResYr_Durable =N_PreResYr2ndM_DURABLE;
PreResYr_Trans =N_PreResYr2ndM_TRANS;
PreResYr_Health =N_PreResYr2ndM_HEALTH;
PreResYr_Insure =N_PreResYr2ndM_INSURE;
PreResYr_Loan =N_PreResYr2ndM_LOAN;
PreResYr_Tax =N_PreResYr2ndM_TAX;
PreResYr_Educa =N_PreResYr2ndM_EDUCA;
PreResYr_Gift =N_PreResYr2ndM_GIFT;
PreResYr_Enter =N_PreResYr2ndM_ENTER;
PreResYr_Recre =N_PreResYr2ndM_RECRE;
PreResYr_Finplan =N_PreResYr2ndM_FINPLAN;
PreResYr_Legala =N_PreResYr2ndM_LEGALA;
PreResYr_IncomeTL =N_PreResYr2ndM_INCOMETL;
end;
When(3) do;
PreResYr_Consume =N_PreResYr3rdM_CONSUME;
PreResYr_Durable =N_PreResYr3rdM_DURABLE;
PreResYr_Trans =N_PreResYr3rdM_TRANS;
PreResYr_Health =N_PreResYr3rdM_HEALTH;
PreResYr_Insure =N_PreResYr3rdM_INSURE;
PreResYr_Loan =N_PreResYr3rdM_LOAN;
PreResYr_Tax =N_PreResYr3rdM_TAX;
PreResYr_Educa =N_PreResYr3rdM_EDUCA;
PreResYr_Gift =N_PreResYr3rdM_GIFT;
PreResYr_Enter =N_PreResYr3rdM_ENTER;
PreResYr_Recre =N_PreResYr3rdM_RECRE;
PreResYr_Finplan =N_PreResYr3rdM_FINPLAN;
PreResYr_Legala =N_PreResYr3rdM_LEGALA;
PreResYr_IncomeTL =N_PreResYr3rdM_INCOMETL;
end;
%Transform(4-11);
Otherwise do;
PreResYr_Consume =N_PreResYr12thM_CONSUME;
PreResYr_Durable =N_PreResYr12thM_DURABLE;
PreResYr_Trans =N_PreResYr12thM_TRANS;
PreResYr_Health =N_PreResYr12thM_HEALTH;
PreResYr_Insure =N_PreResYr12thM_INSURE;
PreResYr_Loan =N_PreResYr12thM_LOAN;
PreResYr_Tax =N_PreResYr12thM_TAX;
PreResYr_Educa =N_PreResYr12thM_EDUCA;
PreResYr_Gift =N_PreResYr12thM_GIFT;
PreResYr_Enter =N_PreResYr12thM_ENTER;
PreResYr_Recre =N_PreResYr12thM_RECRE;
PreResYr_Finplan =N_PreResYr12thM_FINPLAN;
PreResYr_Legala =N_PreResYr12thM_LEGALA;
PreResYr_IncomeTL =N_PreResYr12thM_INCOMETL;
end;
run;
Can I not do a range from 4-11? I am getting 3 errors but i specified an "when" and "end" statement in the macro and the otherwise do is in context to the previous "select" :
NOTE: Line generated by the macro variable "X".
92 N_PreResYr4-11thM_INCOMETL
____________
22
ERROR: Expecting "WHEN", "OTHERWISE", or "END".
ERROR 22-322: Syntax error, expecting one of the following: !, !!, &, *, **, +, -, /, <, <=, <>, =, >, ><, >=, AND, EQ, GE, GT, IN,
LE, LT, MAX, MIN, NE, NG, NL, NOTIN, OR, ^=, |, ||, ~=.
93 end;
94 Otherwise do;
_________
161
ERROR 161-185: No matching DO/SELECT statement.
Once more, I can't thank you enough! This is so helpful - clear, simple, and informative.
I think you may get away with something like this:
data org;
set infl.arc_inflation_jb_102618;
array Pre_ResYr Pre_ResYr:;
do _N_=1 to dim(Pre_ResYr);
name=upcase(vname(Pre_ResYr(_N_)));
select(substr(scan(name,1,'_'),10);
when('1STM') month=1;
when('2NDM') month=2;
/* etcetera */
otherwise month=0; /* for the stuff that you do not want */
end;
if month>0 then do;
Value=Pre_ResYr(_N_);
category=scan(name,2,'_');
output;
end;
end;
keep /* key variables here! */ month value category;
run;
You can then (if you want to) use PROC TRANSPOSE to create variable names from the CATEGORY variable.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.