BookmarkSubscribeRSS Feed
bhong
Calcite | Level 5

Hi, I'm not entirely sure if I can accurately describe what I want to do, but here's an attempt:

 

I have a dataset that's structured similar to this:

 

 

Y11_M6    Y11_M7    Y11_M8    Y11_M9    Y11_M10    Y11_M11    Y11_M12    Y12_M1   Y12_M2    Y12_M3    DT_CHECK_STR    HYST_DT_STR
1         1         1         1         1          1          1          1        1         1         Y10_M6          Y12_M4

 

There are more columns before Y11_M6. I want to only add up the values for the variables that fall between DT_CHECK_STR and HYST_DT_STR. I wish to use arrays to solve this problem, but I'm having no success thus far (and don't know if it's even at all possible). Here's some dummy code:

 

DATA test;
	SET mem(OBS=1); 
	DT_CHECK = intnx('month', EVENT_DT, -10);
	DT_CHECK_STR = cats('Y', put(DT_CHECK, YEAR2.), '_', 'M', month(DT_CHECK));
	HYST_DT_STR = cats('Y', put(EVENT_DT, YEAR2.), '_', 'M', month(EVENT_DT));


	ARRAY vars[*] Y80_M1--Y12_M3;
	DO i=1 TO dim(vars);
		/* IF varname(vars[i]) "between" DT_CHECK_STR and HYST_DT_STR then sum_n = sum(of vars{i}); */ 
	END;

	FORMAT DT_CHECK mmddyy10.;
RUN;

The commented code is clearly wrong, but I wanted to write something as an "example."

 

Another way I wanted to tackle the issue is to create arrays using macro variables. So I would imagine something like:

 

%let x = DT_CHECK_STR;
%let y = HYST_DT_STR;
array vars[*] &x--&y;

Is my coding problem at all feasible? Any help would be great. Thank you!

10 REPLIES 10
Reeza
Super User

What do the date check values look like?

You can change the indexes for an array and that's usually a good method to use.

 


@bhong wrote:

Hi, I'm not entirely sure if I can accurately describe what I want to do, but here's an attempt:

 

I have a dataset that's structured similar to this:

 

 

Y11_M6    Y11_M7    Y11_M8    Y11_M9    Y11_M10    Y11_M11    Y11_M12    Y12_M1   Y12_M2    Y12_M3    DT_CHECK_STR    HYST_DT_STR
1         1         1         1         1          1          1          1        1         1         Y10_M6          Y12_M4

 

There are more columns before Y11_M6. I want to only add up the values for the variables that fall between DT_CHECK_STR and HYST_DT_STR. I wish to use arrays to solve this problem, but I'm having no success thus far (and don't know if it's even at all possible). Here's some dummy code:

 

DATA test;
	SET mem(OBS=1); 
	DT_CHECK = intnx('month', EVENT_DT, -10);
	DT_CHECK_STR = cats('Y', put(DT_CHECK, YEAR2.), '_', 'M', month(DT_CHECK));
	HYST_DT_STR = cats('Y', put(EVENT_DT, YEAR2.), '_', 'M', month(EVENT_DT));


	ARRAY vars[*] Y80_M1--Y12_M3;
	DO i=1 TO dim(vars);
		/* IF varname(vars[i]) "between" DT_CHECK_STR and HYST_DT_STR then sum_n = sum(of vars{i}); */ 
	END;

	FORMAT DT_CHECK mmddyy10.;
RUN;

The commented code is clearly wrong, but I wanted to write something as an "example."

 

Another way I wanted to tackle the issue is to create arrays using macro variables. So I would imagine something like:

 

%let x = DT_CHECK_STR;
%let y = HYST_DT_STR;
array vars[*] &x--&y;

Is my coding problem at all feasible? Any help would be great. Thank you!


 

bhong
Calcite | Level 5
Hi Reeza, the date check variables are SAS dates. However, I created new variables 'DT_CHECK_STR' and 'HYST_DT_STR' that are strings that take values that resemble that column names (e.g., DT_CHECK_STR can be 'Y10_M6'). I was thinking I could do something with it.

Could you elaborate by what you mean by changing the indices for the array? Thank you!

Reeza
Super User
data demo;

array myData(2014:2018) Y2014 Y2015 Y2016 Y2017 Y2018 (1 0 1 0 1);

do year=2014 to 2018; *can use hbound/lbound to get these values;
myData(i) = myData(i) * 20;
end;

run;

You could probably set up a two dimensional array, years across and months and then use those as your index within a double loop....

 

https://stats.idre.ucla.edu/wp-content/uploads/2016/02/bt3009.pdf

bhong
Calcite | Level 5

Hi Reeza, 

 

I tried to do things a different way. 

DATA test2;
	SET test; 
	DT_CHECK = intnx('month', EVENT_DT, -10);
	DT_CHECK_STR = cats('Y', put(DT_CHECK, YEAR2.), '_', 'M', month(DT_CHECK));
	HYST_DT_STR = cats('Y', put(EVENT_DT, YEAR2.), '_', 'M', month(EVENT_DT));
	j=0;
	array vars[*] Y11_M6--Y12_M3;
	do i=1 to dim(vars);
		YEAR = substr(scan(vname(vars[i]), 1, "_"), 2, 2);
		MONTH = input(substr(scan(vname(vars[i]), 2, "_"), 2), 2.);
	
		IF substr(YEAR, 1, 1) = '8' OR substr(YEAR, 1, 1) = '9' THEN NEW_YEAR = input(cats('19', YEAR), 4.);
			ELSE NEW_YEAR = input(cats('20', YEAR), 4.);
		
		IF DT_CHECK <= mdy(MONTH, 1, NEW_YEAR) <= EVENT_DT THEN /* don't know what to include here */;
	end;
	
	
	FORMAT DT_CHECK mmddyy10.;
RUN;

I converted all the column names into SAS dates and used an IF statement to see if those "new" dates are between DT_CHECK and EVENT_DT. However, I am stuck--perhaps you could tell me if my method makes sense? Ultimately, if the new date formed using the column name is between DT_CHECK and EVENT_DT, I only want to sum up the columns in the range. Could you look at my code above and point me in the right direction? 

Reeza
Super User

Assuming your dates check out, it would be:

 

then sum_total = sum(sum_total, vars(i));
bhong
Calcite | Level 5

This seems to work on my test data. Thank you so much. I'll try it once I'm in the office tomorrow--excited to go to work! 🙂

art297
Opal | Level 21

If there will always be corresponding variables for both DT_CHECK_STR and HYST_DT_STR, and the variables you want to sum are in date order, you could use something as simple as:

 

data mem;
  input Y11_M6-Y11_M12 Y12_M1-Y12_M4 (DT_CHECK_STR HYST_DT_STR) ($);
  cards;
1 1 1 1 1 1 1 1 1 1 1 Y11_M9 Y12_M4
;

data _null_;
  set mem (obs=1);
  call execute('data want;set mem;');
  forexec=catx(' ','total=sum(of',vvalue(DT_CHECK_STR),'--',vvalue(HYST_DT_STR),');run;');
  call execute(forexec);
run;

Art, CEO, AnalystFinder.com

 

bhong
Calcite | Level 5
Hi Art, this is interesting. It also seems to work on my test data; however, I'm not too familiar with the functions you used. I'll look them up tomorrow morning. Thanks so much!
RW9
Diamond | Level 26 RW9
Diamond | Level 26

As you have found with this question, and what you will find going forward with that data structure is that it is really not a good structure for programming with.  Say for instance you want to find the result which is 3 months before today, how will you do this.  Some nice lengthy macro code using macro variables and functions to work it out, which will fall over every run and be horrible for others to read?  First, re-model your data into a proper usable format e.g:

date              result

01Jun2011   1

01Jul2011    1

...

What this then gives you is the month/year data (i.e. it is not column names) in a usable, easy to program with format that you can do normal data operations on.  Then your question is simply a matter of sum(result) where date between dt_chk and hyst_chk - which by the way should also be numeric dates, the 01 part is irrlevant, the point is that date numerics are used to make calcualtions and date manipulations a lot easier.  It is no trouble at all to display numeric dates in any format you like, but behind the scenes its always best to keep as numerics.

I assume this is some sort of learning task, but step one of any learning should be learning the fundamentals, and one of those is good data modelling (another of these is learning not to code all in uppercase which makes reading code harder as it is shouting at us!).

s_lassen
Meteorite | Level 14

If I understand you correctly, you want to sum a number of variables, based on whether the variable name (which indicates a month) falls within a certain period.

 

If (and only if) your variables occur in the table in the correct order, with no other variables in between, you can try something like this:

DATA test;
	SET mem(OBS=1); 
	DT_CHECK = intnx('month', EVENT_DT, -10);
	DT_CHECK_STR = cats('Y', put(DT_CHECK, YEAR2.), '_', 'M', month(DT_CHECK));
	HYST_DT_STR = cats('Y', put(EVENT_DT, YEAR2.), '_', 'M', month(EVENT_DT));


	ARRAY vars[*] Y80_M1--Y12_M3;
	DO i=1 TO dim(vars) until(varname(vars(i))=DT_CHECK_STR);
           end;
        do i=i to dim(vars) until(varname(vars(i))=HYST_DT_STR);
          sum_n+vars{i};  
          end;

	FORMAT DT_CHECK mmddyy10.;
drop i; RUN;

The first DO loop finds the variable that matches DT_CHECK_STR, the second sums all the variables up to (and including) the one matching HYST_DT_STR.

 

However, it should also be possible to calculate the starting index:

data test;
  set mem;
  array vars(*) Y80_M1--Y12_M3;
  DT_CHECK=intnx('month',EVENT_DT,-10);
  start=intck('month','01jan1980'd,DT_CHECK)+1;
  if start<1 then do;
    error 'Start date out of range';
    delete;
    end;
  else if start>dim(vars)-10 then do;
    error 'End date out of range';
    delete;
    end;
  do _N_=start to start+10;
    sum_n+vars(_N_);
    end;
  format DT_CHECK mmddyy10.;
run;

This method is faster, and it is also safer, due to the range checks. Both methods assume that there are columns for all the months in all the years.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 10 replies
  • 7873 views
  • 2 likes
  • 5 in conversation