Hello All,
I am trying to fix someones code which fails due to hard-coding with the date ranges.
call symput ('count_range', 'count&file_dt1.-count20200501
count20200430-count20200402');
call symput ('balance_range', 'chd_current_balance_&file_dt1.-chd_current_balance_20200501
chd_current_balance_20200430-chd_current_balance_20200402');
Output : count20200520-count20200501 count20200430-count20200402 chd_current_balance_20200520-chd_current_balance_20200501 chd_current_balance_20200430-chd_current_balance_20200402
The failure is caused whenever there is a sparsity of the data in the repository in this case the data for 20200519 was missing, which is uncommon but happens sometimes. (To give a little credit to the original developer , this code was meant to be run only for few days and he/she took a short cut ).
One approach that might work is a Proc SQL Code which lists all the available dates (I am yet to test this )
proc sql noprint; select distinct cats('count',put(enrollment_dt,yymmddn.)) , cats('chd_current_balance_',put(enrollment_dt,yymmddn.)) into : count_range separated by ' ',:balance_range separated by ' ' from in.repository_current where ^ missing(enrollment_dt); quit;
I am wondering if there is a way to replicate the date ranges similar to the original formats used in the code :
chd_current_balance_20200501-chd_current_balance_20200518 chd_current_balance_20200520 count20200501-count20200518 count20200520
Creating a range till the data is continuously available and either list the subsequent values after the break or create a new range.
here is what i have tried so far :
The enrollment_dt has dates from 20200402 to till date with few dates missing in between
Proc freq data=in.repository_current;
tables enrollment_dt / missing out= temp1(keep=enrollment_dt);
where ^ missing(enrollment_dt);
format enrollment_dt yymmddn.;
run;
data temp2;
set temp1;
mon=month(enrollment_dt);
run;
data temp3;
do _n_=1 by 1 until(last.mon);
array dt[999] _temporary_;
set temp2;
by mon;
dt[_n_]=enrollment_dt;
end;
run;
Any help is appreciated.
@r_behata wrote:
Hi @ballardw
Thanks for your reply.
You are right about the Date Range. I should have referred that as a series of variables in a range.
The Variable Ranges are used in code as below. Though I agree with your assessment about the wrong data structuring, I am little skeptical about making radical changes to the code at this time.
proc summary data = data_out.enroll_ nway missing sumsize=max; class port_declines sortorder; var &count_range. &balance_range.; output out = enroll_sum2 (drop=_freq_ _type_) sum= ; run;
If the data were structured with a date variable then maybe:
proc summary data = data_out.enroll_ nway missing sumsize=max; class port_declines sortorder date; where '23Jan2020'd le date le '18May2020'd; var count. balance; output out = enroll_sum2 (drop=_freq_ _type_) sum= ; run;
Which would give a sum for every day of the period.
Add date grouping format to the procedure like: Format date YYmm6. ; would create sums by calendar month, YYQ. format would create calendar quarter sums.
Which makes the summary much more flexible and there are only 2 values that you may need to place in macro variables, the start and end dates of the period. If you are only ever want after a given date then only ONE value is needed: the start date (assumes the dates are all SAS date values and not random integers like 20200402)
Since it appears that your dataset
in.repository_current
already has a date variable Enrollment_dt then I don't see lots of difficulty. If your process is constantly adding variables then that is something that should be stopped quickly. Because the complexity of constantly updating variable lists adds, as you are finding, lots of people maintenance time and some fragility to the process.
Unfortunately there are lots of people out there that are getting their basic training in manipulating data in classes that use spreadsheets, and typically poorly designed ones so that information can appear on a single page of a PowerPoint presentation or screen for discussion. The teacher uses the same slide and does not have actual changing values where the slide gets updated daily/weekly/monthly/quarterly unlike real world data. Then people try to force tools like SAS to match the approach they learned from those classes.
I don't see any "date range". What I see are variables that have information in them, which often indicates poor data structuring. For anything that involves dates it is almost always a better idea to have a variable with the date, as a SAS date value, and a variable to hold the value.
So, exactly WHAT are you going to do with those apparent variable ranges?
If the values are contiguous (sequential columns) in a data set and you are using this to use as a variable list then perhaps making a string with 2 dashes will work. 2 dashes indicate "from this variable to that variable that are adjacent columns in the dataset". I have to guess because you have not shown anywhere how you use that created macro variable.
Hi @ballardw
Thanks for your reply.
You are right about the Date Range. I should have referred that as a series of variables in a range.
The Variable Ranges are used in code as below. Though I agree with your assessment about the wrong data structuring, I am little skeptical about making radical changes to the code at this time.
proc summary data = data_out.enroll_ nway missing sumsize=max;
class port_declines sortorder;
var &count_range.
&balance_range.;
output out = enroll_sum2 (drop=_freq_ _type_) sum= ;
run;
@r_behata wrote:
Hi @ballardw
Thanks for your reply.
You are right about the Date Range. I should have referred that as a series of variables in a range.
The Variable Ranges are used in code as below. Though I agree with your assessment about the wrong data structuring, I am little skeptical about making radical changes to the code at this time.
proc summary data = data_out.enroll_ nway missing sumsize=max; class port_declines sortorder; var &count_range. &balance_range.; output out = enroll_sum2 (drop=_freq_ _type_) sum= ; run;
Why not just sum all of the variables? It probably wouldn't take much more time than only selecting some of them to sum.
The Double hyphen '--' method to list the consecutive variables makes complete sense and work till the summary step :
data _null_;
/* Enrollments File Date */
file_dt1 = intnx('day', today(), -0);
/* Date ranges in descending order */
call symput ('count_range', 'count20200402 -- count&file_dt1.');
call symput ('balance_range', 'chd_current_balance_20200402 -- chd_current_balance_&file_dt1.');
run;
proc summary data = data_out.enroll_decl_bal_sum_hist nway missing sumsize=max;
class port_declines sortorder;
var &count_range.
&balance_range.;
output out = enroll_decl_bal_sum2 (drop=_freq_ _type_) sum= ;
run;
However, I Just realized that there is a reporting step that refers the same Macro Variable in the Original Reverse Order . The Looks like the Same macro variable after the change can no longer be used in this step.
The change of Order of Variables seems to cause a problem at this stage of the program.
proc report data=enroll_decl_bal_sum2 split='*'
style(summary)=[background=lightblue foreground=black]
style(header)=[background=lightblue foreground=black font_weight=bold];
title1 justify=left bold height=12pt ;
footnote1;
column port_declines ('# of Accounts' &count_range. );
Run;
Any suggestions ?
@r_behata wrote:
Hi @ballardw
Thanks for your reply.
You are right about the Date Range. I should have referred that as a series of variables in a range.
The Variable Ranges are used in code as below. Though I agree with your assessment about the wrong data structuring, I am little skeptical about making radical changes to the code at this time.
proc summary data = data_out.enroll_ nway missing sumsize=max; class port_declines sortorder; var &count_range. &balance_range.; output out = enroll_sum2 (drop=_freq_ _type_) sum= ; run;
If the data were structured with a date variable then maybe:
proc summary data = data_out.enroll_ nway missing sumsize=max; class port_declines sortorder date; where '23Jan2020'd le date le '18May2020'd; var count. balance; output out = enroll_sum2 (drop=_freq_ _type_) sum= ; run;
Which would give a sum for every day of the period.
Add date grouping format to the procedure like: Format date YYmm6. ; would create sums by calendar month, YYQ. format would create calendar quarter sums.
Which makes the summary much more flexible and there are only 2 values that you may need to place in macro variables, the start and end dates of the period. If you are only ever want after a given date then only ONE value is needed: the start date (assumes the dates are all SAS date values and not random integers like 20200402)
Since it appears that your dataset
in.repository_current
already has a date variable Enrollment_dt then I don't see lots of difficulty. If your process is constantly adding variables then that is something that should be stopped quickly. Because the complexity of constantly updating variable lists adds, as you are finding, lots of people maintenance time and some fragility to the process.
Unfortunately there are lots of people out there that are getting their basic training in manipulating data in classes that use spreadsheets, and typically poorly designed ones so that information can appear on a single page of a PowerPoint presentation or screen for discussion. The teacher uses the same slide and does not have actual changing values where the slide gets updated daily/weekly/monthly/quarterly unlike real world data. Then people try to force tools like SAS to match the approach they learned from those classes.
Why not create empty tables for the missing dates?
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.