I am wondering if there is a way to modularize my program by setting up 'include' files to be called in the main.sas program I have. I have set up MACROs in the main file but I want this to be a little cleaner and put the data commands in one file and the statistical analyses in the main. I've attached my main file.
Any ideas?
Generally speaking, yeah, you could make this more flexible with a macro. I see two embedded macros. You could certainly save those somewhere else then reference them in your main sas macro program with an %includes statement. I'd probably create a macro out of the main to loop through all the different yearly dataset names. Use a %do i=2014 %to 2018 (or have the max year be a function or pass a parameter for the max year into the macro). Then, you replace all the hardcoded dataset names to something lie &&b&i. or &&hiv&i. When appending, I'd recommend renaming the base at some point. I see it is named hiv2014 throughout after you do all the appends, even though it has data for all years at that point. You could do an %IF i=1 %then create your base dataset %else proc append base=ds_name data = &&hiv&i.;
For debugging purposes, I like MPRINT MLOGIC & MFILE options when I'm building a macro program.
By the way, this is NOT homework. I completed my masters in March. I am redoing this for submission somewhere.
Generally speaking, yeah, you could make this more flexible with a macro. I see two embedded macros. You could certainly save those somewhere else then reference them in your main sas macro program with an %includes statement. I'd probably create a macro out of the main to loop through all the different yearly dataset names. Use a %do i=2014 %to 2018 (or have the max year be a function or pass a parameter for the max year into the macro). Then, you replace all the hardcoded dataset names to something lie &&b&i. or &&hiv&i. When appending, I'd recommend renaming the base at some point. I see it is named hiv2014 throughout after you do all the appends, even though it has data for all years at that point. You could do an %IF i=1 %then create your base dataset %else proc append base=ds_name data = &&hiv&i.;
For debugging purposes, I like MPRINT MLOGIC & MFILE options when I'm building a macro program.
Thank you! I actually beat you to it on the base, naming it hiv_all. I am still learning SAS and MACRO functionality in general. I found a great paper, http://www2.sas.com/proceedings/sugi24/Handson/p149-24.pdf, which might help me understand the ampersand functions you recommended.
Thank you so much for such a quick response.
Can you point me to some documentation that will assist me with this functionality?
I'm starting to get a little more comfortable with macro programming myself, but I'm still learning. The main thing to understand is that the macro facility really only regenerates text. It isn't actually running any data steps or processing anything in SAS. Once I understood that, they started to make more sense to me. Once it generates the text, then SAS data steps process that.
Here's something you could play around with. There are other folks on here who are better who can probably clean up my code. Also, one thing I noted was that your 2014 & 2015 datasets were called hiv2015 & hiv2015, but then you appended the base with lgbt2015 and lgbt2016. The macro below would just append everything with datasets starting hiv. You also have a proc sort on lgbt2014, but all your procs are referencing hiv2014 (or hiv_all).
%include filepath with the two embedded macros to compile (add in the proc format?)
%macro main(max);
%do i=2014 %to &max.;
libname b&i xport "/folders/myfolders/HIV-Millennials/LLCP&i..XPT";
libname hiv_data '/folders/myfolders/HIV-Millennials';
data hiv_data.b&i;
length _STATE 8 ADDEPEV2 8 EDUCA 8 HIVTST6 8 MARITAL 8 SEX 8 SXORIENT 8 HLTHPLN1 8 TRNSGNDR 8 ;
set b&i.llcp&i;
run;
proc format;
value educa 0='No College' 1='College or Degree';
value agecat 1='Millennials' 2='Non-millennials';
value sex 0='Male' 1='Female';
value health 0='No' 1='Yes';
value trnsgndr 1='Yes' 0='No';
value hiv 0='No' 1='Yes';
value sxorient 1='Straight' 2='Lesbian/Gay' 3='Bisexual' 4='Other';
value race 0='White' 1='Black' 2='Hispanic' 3='Other';
value state 0='West' 1='Southwest' 2='Midwest' 3='Southeast' 4='Northeast'
5='U.S. Territory';
value married 0='Not Married' 1='Married';
value depression 0='No previous depression' 1='Previous depression';
run;
%if &i = 1 %then %do;
data hiv_final (keep=_race _age_g ADDEPEV2 _state trnsgndr marital hivtst6 sex agecat hlthpln1
educa sxorient);
set hiv_data.b&i;
%filters;
%conditions;
run;
%end;
%else %do;
data hiv&i (keep=_race _age_g ADDEPEV2 _state trnsgndr marital hivtst6 sex agecat hlthpln1
educa sxorient);
set hiv_data.b&i;
%filters;
%conditions;
run;
proc append base=hiv_final data=hiv&i;
run;
%end;
* sort the appended data by sex ;
proc sort data=hiv_final;
by sex;
run;
* run frequency/association testing on covariates against variable of interest ;
proc freq data=hiv_final;
tables (ADDEPEV2 sex _race marital _state trnsgndr sxorient hlthpln1 educa) *
agecat / chisq;
format sex sex. agecat agecat. _race race. marital married.
_state state. sxorient sxorient. hlthpln1 health.
educa educa. trnsgndr trnsgndr. ADDEPEV2 depression. ;
run;
/* univariate analysis of covariates and exposure against outcome */
proc freq data=hiv_final;
tables (ADDEPEV2 sex agecat _race marital _state trnsgndr sxorient hlthpln1
educa) * hivtst6 / chisq;
format sex sex. agecat agecat. _race race. marital married.
_state state. sxorient sxorient. hlthpln1 health.
educa educa. trnsgndr trnsgndr. hivtst6 hiv. ADDEPEV2 depression.;
run;
/* logistic regression of covariates against ever been told had depressive diagnosis */
proc logistic data=hiv_final;
class hivtst6(ref='1') ADDEPEV2(ref='1') sex(ref='0') agecat(ref='2')
_race(ref='0') marital(ref='0') _state(ref='2') trnsgndr(ref='0')
sxorient(ref='1') hlthpln1(ref='0') educa(ref='0');
model hivtst6=ADDEPEV2 sex agecat _race marital _state trnsgndr sxorient
hlthpln1 educa;
run;
%mend main();
%main(2018) /*or maybe pass in the function year(today())*/
Thank you for identifying my errors! This is definitely a work in progress, so I hope you don't think it was submitted this way lol. The text generation does make sense and does clear up how I will approach this.
I attached a test version that I did in notepad. Hope this helps
VDD and Bob thank you both for the excellent examples. I will research these methods before implementing them, so I can avoid asking this question again in the future. Thank you very much.
I don't think you need the need the macros you have or any new macros that contain macro loops or &&& references.
This was not tested for obvious reasons but have relative few changes except to combine steps.
libname b14 xport '/folders/myfolders/HIV-Millennials/LLCP2014.XPT';
libname b15 xport '/folders/myfolders/HIV-Millennials/LLCP2015.XPT';
libname b16 xport '/folders/myfolders/HIV-Millennials/LLCP2016.XPT';
libname b17 xport '/folders/myfolders/HIV-Millennials/LLCP2017.XPT';
libname hiv_data '/folders/myfolders/HIV-Millennials';
data hiv_data.b_years;
length _STATE 8 ADDEPEV2 8 EDUCA 8 HIVTST6 8 MARITAL 8 SEX 8 SXORIENT 8 HLTHPLN1 8 TRNSGNDR 8 ;
set b14.llcp2014 b15.llcp2015 b16.llcp2016 b17.llcp2017 indsname=indsname;
year = input(substrn(indsname,length(indsname)-3),4.)
run;
proc format;
value educa 0='No College' 1='College or Degree';
value agecat 1='Millennials' 2='Non-millennials';
value sex 0='Male' 1='Female';
value health 0='No' 1='Yes';
value trnsgndr 1='Yes' 0='No';
value hiv 0='No' 1='Yes';
value sxorient 1='Straight' 2='Lesbian/Gay' 3='Bisexual' 4='Other';
value race 0='White' 1='Black' 2='Hispanic' 3='Other';
value state 0='West' 1='Southwest' 2='Midwest' 3='Southeast' 4='Northeast' 5='U.S. Territory';
value married 0='Not Married' 1='Married';
value depression 0='No previous depression' 1='Previous depression';
run;
data analysis(keep=_race _age_g ADDEPEV2 _state trnsgndr marital hivtst6 sex agecat hlthpln1 educa sxorient);
set hiv_data.b_years;
where sxorient in (1, 2, 3, 4) and marital in (1, 2, 3, 4, 5, 6) and
ADDEPEV2 in (1, 2) and educa in (1, 2, 3, 4, 5, 6) and trnsgndr in(1,2,3,4) and
hlthpln1 in (1, 2) AND _STATE IN(1,2,5:6,8:13,15:34,36:39,41:42,44:47,49:51,53:56,66,72) and
hivtst6 in (1, 2) and _race in (1, 2, 3, 4, 5, 6, 7, 8) and sex in(1,2);
if educa in(1, 2, 3, 4, 9) then educa=0;
if educa in(5, 6) then educa=1;
if sex=1 then sex=0;
if sex=2 then sex=1;
if _age_g in(1, 2) then agecat=1;
if _age_g in(3, 4, 5, 6) then agecat=2;
if hlthpln1=1 then healthcare=1;
if hlthpln1=2 then hlthpln1=0;
if hivtst6=1 then hivtst6=1;
if hivtst6=2 then hivtst6=0;
if ADDEPEV2 in(1) then ADDEPEV2=1;
if ADDEPEV2 in(2) then ADDEPEV2=0;
if _race in(1) then _race=0;
if _race in(2) then _race=1;
if _race in(7) then _race=2;
if _race in(3, 4, 5, 6, 8) then _race=3;
if trnsgndr in(1,2,3) then trnsgndr=1;
if trnsgndr=4 then trnsgndr=0;
if _state in(53, 30, 41, 32, 49, 56, 16, 6, 8, 2, 15) then _state=0;
/* */
/* verified, 11 */
/* if _state in(4, 35, 48, 40) then _state=1; */
/* verified, 4 no records at all */
if _state in(38, 46, 31, 20, 27, 19, 29, 55, 17, 18, 26, 39) then _state=2;
/* REFERENCE verified, 12 */
if _state in(5, 22, 28, 1, 47, 21, 54, 51, 11, 37, 38, 13, 12, 10, 24) then
_state=3;
/* verified, 15 */
if _state in(42, 36, 50, 44, 9, 23, 34, 33, 25) then _state=4;
/* verified, 9 */
if _state in(66, 72) then _state=5;
/* verified, 2 */
if marital in(2, 3, 4, 5, 6) then marital='0';
if marital in(1) then marital='1';
run;
* sort the appended data by sex ;
proc sort data=analysis;
by sex;
run;
* run frequency/association testing on covariates against variable of interest ;
proc freq data=analysis;
tables (ADDEPEV2 sex _race marital _state trnsgndr sxorient hlthpln1 educa) * agecat / chisq;
tables (ADDEPEV2 sex agecat _race marital _state trnsgndr sxorient hlthpln1 educa) * hivtst6 / chisq;
format sex sex. agecat agecat. _race race. marital married.
_state state. sxorient sxorient. hlthpln1 health.
educa educa. trnsgndr trnsgndr. hivtst6 hiv. ADDEPEV2 depression.;
run;
/* logistic regression of covariates against ever been told had depressive diagnosis */
proc logistic data=analyisl;
class hivtst6(ref='1') ADDEPEV2(ref='1') sex(ref='0') agecat(ref='2')
_race(ref='0') marital(ref='0') _state(ref='2') trnsgndr(ref='0')
sxorient(ref='1') hlthpln1(ref='0') educa(ref='0');
model hivtst6=ADDEPEV2 sex agecat _race marital _state trnsgndr sxorient hlthpln1 educa;
run;
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.