Hi, I'm not entirely sure if I can accurately describe what I want to do, but here's an attempt:
I have a dataset that's structured similar to this:
Y11_M6 Y11_M7 Y11_M8 Y11_M9 Y11_M10 Y11_M11 Y11_M12 Y12_M1 Y12_M2 Y12_M3 DT_CHECK_STR HYST_DT_STR
1 1 1 1 1 1 1 1 1 1 Y10_M6 Y12_M4
There are more columns before Y11_M6. I want to only add up the values for the variables that fall between DT_CHECK_STR and HYST_DT_STR. I wish to use arrays to solve this problem, but I'm having no success thus far (and don't know if it's even at all possible). Here's some dummy code:
DATA test;
SET mem(OBS=1);
DT_CHECK = intnx('month', EVENT_DT, -10);
DT_CHECK_STR = cats('Y', put(DT_CHECK, YEAR2.), '_', 'M', month(DT_CHECK));
HYST_DT_STR = cats('Y', put(EVENT_DT, YEAR2.), '_', 'M', month(EVENT_DT));
ARRAY vars[*] Y80_M1--Y12_M3;
DO i=1 TO dim(vars);
/* IF varname(vars[i]) "between" DT_CHECK_STR and HYST_DT_STR then sum_n = sum(of vars{i}); */
END;
FORMAT DT_CHECK mmddyy10.;
RUN;
The commented code is clearly wrong, but I wanted to write something as an "example."
Another way I wanted to tackle the issue is to create arrays using macro variables. So I would imagine something like:
%let x = DT_CHECK_STR;
%let y = HYST_DT_STR;
array vars[*] &x--&y;
Is my coding problem at all feasible? Any help would be great. Thank you!
What do the date check values look like?
You can change the indexes for an array and that's usually a good method to use.
@bhong wrote:
Hi, I'm not entirely sure if I can accurately describe what I want to do, but here's an attempt:
I have a dataset that's structured similar to this:
Y11_M6 Y11_M7 Y11_M8 Y11_M9 Y11_M10 Y11_M11 Y11_M12 Y12_M1 Y12_M2 Y12_M3 DT_CHECK_STR HYST_DT_STR 1 1 1 1 1 1 1 1 1 1 Y10_M6 Y12_M4
There are more columns before Y11_M6. I want to only add up the values for the variables that fall between DT_CHECK_STR and HYST_DT_STR. I wish to use arrays to solve this problem, but I'm having no success thus far (and don't know if it's even at all possible). Here's some dummy code:
DATA test; SET mem(OBS=1); DT_CHECK = intnx('month', EVENT_DT, -10); DT_CHECK_STR = cats('Y', put(DT_CHECK, YEAR2.), '_', 'M', month(DT_CHECK)); HYST_DT_STR = cats('Y', put(EVENT_DT, YEAR2.), '_', 'M', month(EVENT_DT)); ARRAY vars[*] Y80_M1--Y12_M3; DO i=1 TO dim(vars); /* IF varname(vars[i]) "between" DT_CHECK_STR and HYST_DT_STR then sum_n = sum(of vars{i}); */ END; FORMAT DT_CHECK mmddyy10.; RUN;
The commented code is clearly wrong, but I wanted to write something as an "example."
Another way I wanted to tackle the issue is to create arrays using macro variables. So I would imagine something like:
%let x = DT_CHECK_STR; %let y = HYST_DT_STR; array vars[*] &x--&y;
Is my coding problem at all feasible? Any help would be great. Thank you!
data demo;
array myData(2014:2018) Y2014 Y2015 Y2016 Y2017 Y2018 (1 0 1 0 1);
do year=2014 to 2018; *can use hbound/lbound to get these values;
myData(i) = myData(i) * 20;
end;
run;
You could probably set up a two dimensional array, years across and months and then use those as your index within a double loop....
https://stats.idre.ucla.edu/wp-content/uploads/2016/02/bt3009.pdf
Hi Reeza,
I tried to do things a different way.
DATA test2;
SET test;
DT_CHECK = intnx('month', EVENT_DT, -10);
DT_CHECK_STR = cats('Y', put(DT_CHECK, YEAR2.), '_', 'M', month(DT_CHECK));
HYST_DT_STR = cats('Y', put(EVENT_DT, YEAR2.), '_', 'M', month(EVENT_DT));
j=0;
array vars[*] Y11_M6--Y12_M3;
do i=1 to dim(vars);
YEAR = substr(scan(vname(vars[i]), 1, "_"), 2, 2);
MONTH = input(substr(scan(vname(vars[i]), 2, "_"), 2), 2.);
IF substr(YEAR, 1, 1) = '8' OR substr(YEAR, 1, 1) = '9' THEN NEW_YEAR = input(cats('19', YEAR), 4.);
ELSE NEW_YEAR = input(cats('20', YEAR), 4.);
IF DT_CHECK <= mdy(MONTH, 1, NEW_YEAR) <= EVENT_DT THEN /* don't know what to include here */;
end;
FORMAT DT_CHECK mmddyy10.;
RUN;
I converted all the column names into SAS dates and used an IF statement to see if those "new" dates are between DT_CHECK and EVENT_DT. However, I am stuck--perhaps you could tell me if my method makes sense? Ultimately, if the new date formed using the column name is between DT_CHECK and EVENT_DT, I only want to sum up the columns in the range. Could you look at my code above and point me in the right direction?
Assuming your dates check out, it would be:
then sum_total = sum(sum_total, vars(i));
This seems to work on my test data. Thank you so much. I'll try it once I'm in the office tomorrow--excited to go to work! 🙂
If there will always be corresponding variables for both DT_CHECK_STR and HYST_DT_STR, and the variables you want to sum are in date order, you could use something as simple as:
data mem;
input Y11_M6-Y11_M12 Y12_M1-Y12_M4 (DT_CHECK_STR HYST_DT_STR) ($);
cards;
1 1 1 1 1 1 1 1 1 1 1 Y11_M9 Y12_M4
;
data _null_;
set mem (obs=1);
call execute('data want;set mem;');
forexec=catx(' ','total=sum(of',vvalue(DT_CHECK_STR),'--',vvalue(HYST_DT_STR),');run;');
call execute(forexec);
run;
Art, CEO, AnalystFinder.com
As you have found with this question, and what you will find going forward with that data structure is that it is really not a good structure for programming with. Say for instance you want to find the result which is 3 months before today, how will you do this. Some nice lengthy macro code using macro variables and functions to work it out, which will fall over every run and be horrible for others to read? First, re-model your data into a proper usable format e.g:
date result
01Jun2011 1
01Jul2011 1
...
What this then gives you is the month/year data (i.e. it is not column names) in a usable, easy to program with format that you can do normal data operations on. Then your question is simply a matter of sum(result) where date between dt_chk and hyst_chk - which by the way should also be numeric dates, the 01 part is irrlevant, the point is that date numerics are used to make calcualtions and date manipulations a lot easier. It is no trouble at all to display numeric dates in any format you like, but behind the scenes its always best to keep as numerics.
I assume this is some sort of learning task, but step one of any learning should be learning the fundamentals, and one of those is good data modelling (another of these is learning not to code all in uppercase which makes reading code harder as it is shouting at us!).
If I understand you correctly, you want to sum a number of variables, based on whether the variable name (which indicates a month) falls within a certain period.
If (and only if) your variables occur in the table in the correct order, with no other variables in between, you can try something like this:
DATA test;
SET mem(OBS=1);
DT_CHECK = intnx('month', EVENT_DT, -10);
DT_CHECK_STR = cats('Y', put(DT_CHECK, YEAR2.), '_', 'M', month(DT_CHECK));
HYST_DT_STR = cats('Y', put(EVENT_DT, YEAR2.), '_', 'M', month(EVENT_DT));
ARRAY vars[*] Y80_M1--Y12_M3;
DO i=1 TO dim(vars) until(varname(vars(i))=DT_CHECK_STR);
end;
do i=i to dim(vars) until(varname(vars(i))=HYST_DT_STR);
sum_n+vars{i};
end;
FORMAT DT_CHECK mmddyy10.;
drop i;
RUN;
The first DO loop finds the variable that matches DT_CHECK_STR, the second sums all the variables up to (and including) the one matching HYST_DT_STR.
However, it should also be possible to calculate the starting index:
data test;
set mem;
array vars(*) Y80_M1--Y12_M3;
DT_CHECK=intnx('month',EVENT_DT,-10);
start=intck('month','01jan1980'd,DT_CHECK)+1;
if start<1 then do;
error 'Start date out of range';
delete;
end;
else if start>dim(vars)-10 then do;
error 'End date out of range';
delete;
end;
do _N_=start to start+10;
sum_n+vars(_N_);
end;
format DT_CHECK mmddyy10.;
run;
This method is faster, and it is also safer, due to the range checks. Both methods assume that there are columns for all the months in all the years.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.