Hey all,
I'm currently attempting to create a rather large dataset so here is an overview of the general steps I'm thinking I can take, how I plan to actually do it and then the issues I'm noticing so far.
My first steps are to merge together 6 datasets (data from 2012-2017, 1 year per dataset) using proc sql then create dummy variables to denote certain disease diagnoses. After I create those dummy variables I need to create two different "flag" variables that add together some of these dummy variables. After I have these flag variables I can toss the dummy variables for cleanliness purposes.
The problem I'm running into is the following: The datasets of 2012-2017 data have 1 observation for every visit to the hospital by that ID. One goal is to create dummy variables assigning a '1' to an individual when they have one visit in a year. We only want patients that were seen at least once every year from 2012-2017, but if I create a dummy variable by each year, I'm having trouble actually telling SAS to add these dummy variables back together in a manner where I only add 1 once, per year, for the 5 years.
Ideally I'm hoping to take this information, which has potentially hundreds of visits per year per person, and create a variable that (at max!) would equal 6, telling me that they made at least 1 visit per year across all 6 years being measured.
I attempted to create an array but it just added together all the dummy variables for that ID within that year, rather than reading all years.
Here is the code (ge1_x is just my dummy variable that should assign a 1 for 1 visit within that year):
data utilization1;
set utilization;
by corp_id;
years=0;
array score {6} ge1_12 ge1_13 ge1_14 ge1_15 ge1_16 ge1_17;
do i=1 to 5;
if score{i}=1 then years=years+1;
if score{i}=0 then years=years+0;
if score{i}=. then years=.;
end;
run;
The variables I have to work with are ID, discharge_date (I used this to create ge1_x above), and discharge_year. I'm think discharge_year may be the easier variable to do this with though.
Hopefully that makes a bit of sense. Any advice is greatly appreciated!
A small Example starting data and what you expect the result to be for that data as well the code you have used so far.
And perhaps an explanation of how you intend to use that data. A fair number of the questions we get related to collapsing multiple rows of data to single observation intend to replicate a process developed in spreadsheets and that is often a poor approach with data in SAS.
Many times you are likely to be better off for many forms of analysis to have a single record with identification information (patient date etc) and variable with diagnosis.
Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the {i} icon or attached as text to show exactly what you have and that we can test code against.
Your statement:
Ideally I'm hoping to take this information, which has potentially hundreds of visits per year per person, and create a variable that (at max!) would equal 6, telling me that they made at least 1 visit per year across all 6 years being measured.
Leads me to believe that the only thing you actually need is a patient id and a date (you don't say whether an admit or a discharge is more important for membership in a year). A double proc freq may be all you need.
If you start with a data set with one record per visit per patient then something like:
proc freq data=have noprint; tables patientid*admitdate/out=work.count1; format admitdate year4.; run; Proc freq data=work.count1 noprint; tables patientid /out=work.count2; run;
Assumptions:
1) your admitdate variable is an actual SAS date value.
2) there is a single admit date per record
3) the Have data set has all of the visits from the years 2012 to 2017 of interest
what this code does is count the number of visits per calendar year based on the admin date (Yes there are duplicates).
In the work.count1 set the output might look like:
Patiend date (which will appear as the 4 digit year) Count
1 2012 8
1 2013 1
1 2014 2
1 2016 3
2 2012 2
2 2013 3
2 2014 1
2 2015 4
2 2016 8
2 2017 2
Which when summarized the second time reduces to
Patient Count
1 4
2 6
And you can tell which patients have a visit per year.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.