I have a hospital data set with a list of 18 variables showing whether each patient was admitted to the hospital in the last 18 years (1=yes, 2=no) (I have some other variables that I do not want to change). In a new variable (number_of_admissions_to_hospital) I want to count how many times each patient was admitted to the hospital in the last 18 years.
ID Y_2000 Y_2001 Y_2002 .... Y_2018 number_of_admissions_to_hospital
1 1 1 2 2 2
2 2 1 2 2 1
3 1 1 2 1 3
I appreciate it if you help me to add number_of_admissions_to_hospital to the variables of my data set.
Thank you!
proc transpose data=have out=long (drop=_name_);
by id;
var y_:;
run;
proc sql;
create table want as
select
id,
sum(case
when col1 = 1
then 1
else 0
end) as number_of_admissions_to_hospital
from long
group by id
;
quit;
As usual, a long dataset layout makes the coding simple.
proc transpose data=have out=long (drop=_name_);
by id;
var y_:;
run;
proc sql;
create table want as
select
id,
sum(case
when col1 = 1
then 1
else 0
end) as number_of_admissions_to_hospital
from long
group by id
;
quit;
As usual, a long dataset layout makes the coding simple.
Thank you, KurtBremser. So, with other variables in my data set, how does the long version of the data works? Then I need to return to the wide version of data. This new variable will be just a covariate that I will use in regression analysis and I need to work with the wide format of the data.
I found ways to concatenate a list of character or numeric variables. I wonder if there is a function that I use to calculate the sum of values for these years where my Year values are 1 (that is individuals were admitted to hospital).
Thank you!
In my opinion, covariates in regression analysis (or any modeling) are the exception to the rule where you should prefer long data sets to wide. I would still create the data set long, as Kurt Bremser has done, do as much analysis as you can for using the long data set, and then for regression use PROC TRANSPOSE to return it to wide.
proc sql;
create table want as
select
id,
sum(case
when col1 = 1 and upcase(_name_) like '%Y'
then 1
else 0
end) as number_of_admissions_to_hospital
sum(case
when col1 = 1 and upcase(_name_) like '%INF'
then 1
else 0
end) as number_of_infections
from long
group by id
;
quit;
Add additional conditions as needed to the CASE clauses.
Thank you, KurtBremser. I cannot figure out what does the following part does. May you please help me with that?
and upcase(_name_) like '%Y'
Thank you!
_NAME_ is a variable created by the TRANSPOSE procedure, it holds the name of the variable that was transposed to this particular observation.
Here, I select for the values of all entries where the former variable name started with a Y, either upoer or lower case.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.