What modifications do I need to make to SAS procedures (i.e., proc surveyfreq, proc surveymean, proc surveylogistic) to accommodate a "long" data set that is the result of a one-to-many merge?
I'm merging 11 data sets from a complex survey. 10 of them have 1 observation per respondent. The 11th has multiple observations per respondent (Medications: 1 respondent could be taking multiple medications. Each medication is recorded as a separate observation for the same respondent ID).
I performed the merge using this syntax:
DATA work.data_2009_2010;
MERGE
work.drug_use_2009_2010_sort
work.med_conds_2009_2010_cvd_sort
work.demos_2009_2010_sort
work.etoh_use_2009_2010_sort
work.smk_hx_2009_2010_sort
work.hdl_2009_2010_sort
work.tchol_2009_2010_sort
work.trigly_2009_2010_sort
work.bmx_2009_2010_sort
work.bpx_2009_2010_sort
work.rxq_rx_2009_2010_sort
BY SEQN;
RUN;
When I perform a typical proc surveyfreq I see that multiple observations are used per respondent (see image). This overstates the number of events (i.e., freq = 366)
This the syntax I am using:
proc surveyfreq data=&dataset;
STRATUM sdmvstra;
CLUSTER sdmvpsu;
WEIGHT &weight;
table
/* flag_1*year */
flag_1 /cl col chisq cellchi2 relrisk;
;
format
ever_told_mi yes_no_fmt.
year yearfmt.
&exp cann_use_statusfmt.
;
run;
You modify your data, the procs do not handle this.
Transpose the medication data set to a wide format before merging.
If you're not familiar with data transposing, here are transposing data tutorials:
Long to Wide:
https://stats.idre.ucla.edu/sas/modules/how-to-reshape-data-long-to-wide-using-proc-transpose/
https://stats.idre.ucla.edu/sas/modules/reshaping-data-long-to-wide-using-the-data-step/
Wide to Long:
https://stats.idre.ucla.edu/sas/modules/how-to-reshape-data-wide-to-long-using-proc-transpose/
https://stats.idre.ucla.edu/sas/modules/reshaping-data-wide-to-long-using-a-data-step/
And sometimes a double transpose is needed for extra wide data sets:
https://gist.github.com/statgeek/2321b6f62ab78d5bf2b0a5a8626bd7cd
Looks like in way or another which ever set is the "many" you need to decide what you want and likely reduce it to a single observation prior to the merge. What and how, depends on what you need for analysis.
BTW you weighting variable is also now 'overweighting' any respondent that has multiple records in that many set unless the weight variable is designed for that many data set, which is unlikely.
You modify your data, the procs do not handle this.
Transpose the medication data set to a wide format before merging.
If you're not familiar with data transposing, here are transposing data tutorials:
Long to Wide:
https://stats.idre.ucla.edu/sas/modules/how-to-reshape-data-long-to-wide-using-proc-transpose/
https://stats.idre.ucla.edu/sas/modules/reshaping-data-long-to-wide-using-the-data-step/
Wide to Long:
https://stats.idre.ucla.edu/sas/modules/how-to-reshape-data-wide-to-long-using-proc-transpose/
https://stats.idre.ucla.edu/sas/modules/reshaping-data-wide-to-long-using-a-data-step/
And sometimes a double transpose is needed for extra wide data sets:
https://gist.github.com/statgeek/2321b6f62ab78d5bf2b0a5a8626bd7cd
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.