- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi guys,
suppose to have the following:
data DB;
input ID :$20. Admission :date09. Discharge :date09. Variable1 Variable2;
cards;
166 16FEB2019 26FEB2019 1 0
170 22FEB2017 07MAR2017 1 0
170 22FEB2017 07MAR2017 0 1
170 30JAN2019 04MAR2019 0 0
313 03MAR2016 10MAR2016 1 0
313 03MAR2016 10MAR2016 0 1
313 12DEC2019 15DEC2019 1 0
215 22DEC2014 25DEC2014 1 1
;
run;
Is there a way to get the following?
data DB1;
input ID :$20. Admission :date09. Discharge :date09. Variable1 Variable2;
cards;
166 16FEB2019 26FEB2019 1 0
170 22FEB2017 07MAR2017 1 1
170 22FEB2017 07MAR2017 0 0
170 30JAN2019 04MAR2019 0 0
313 03MAR2016 10MAR2016 1 1
313 03MAR2016 10MAR2016 0 0
313 12DEC2019 15DEC2019 1 0
In other words (referring ONLY to replicated dates) if Variable1 =1 and Variable2 = 0 but Variable2 =1 for the second replica then move the value of Variable2=1 to the replica where Variable1 = 1.Variable1 = 1 always at the first replica of dates.
Thank you very much in advance
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi hope this helps,
data DB;
input ID :$20. Admission :date09. Discharge :date09. Variable1 Variable2;
cards;
166 16FEB2019 26FEB2019 1 0
170 22FEB2017 07MAR2017 1 0
170 22FEB2017 07MAR2017 0 1
170 30JAN2019 04MAR2019 0 0
313 03MAR2016 10MAR2016 1 0
313 03MAR2016 10MAR2016 0 1
313 12DEC2019 15DEC2019 1 0
215 22DEC2014 25DEC2014 1 1
;
run;
proc sort;
by ID Admission Discharge;
run;
/* Step 1: Create a temporary dataset with the maximum value of Variable2 for each combination of ID, Admission, and Discharge */
proc sql;
create table Temp as
select ID, Admission, Discharge, max(Variable2) as MaxVar2
from DB
group by ID, Admission, Discharge;
quit;
/* Step 2: Merge the temporary dataset back with the original dataset to update Variable2 */
data DB1;
merge DB(in=a drop=variable2) Temp(in=b);
by ID Admission Discharge;
if a;
if Variable1 = 1 then Variable2 = MaxVar2;
if cmiss(variable2)=1 then variable2=0;
drop MaxVar2;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Your description is a little unclear. It would help if you gave the variables more meaningful names to reflect what they contain. It would also help (and is probably going to be necessary) to create NEW variables to have the values you want to calculate (not "move").
For now we could call the old variables HAVE1 and HAVE2 and the new variables WANT1 and WANT2. You can always add a rename and/or drop statement to change the names so the new variables end up with the old names.
Your example does not have the result for the last observation. But let's assume since there are is only one observation that the values do not change. Let's just add your expected result as two extra variables in our example input dataset so it will be easy to detect if we succeed in making them.
data DB;
input ID :$20. Admission :date. Discharge :date. have1 have2 expect1 expect2;
format admission discharge date9.;
cards;
166 16FEB2019 26FEB2019 1 0 1 0
170 22FEB2017 07MAR2017 1 0 1 1
170 22FEB2017 07MAR2017 0 1 0 0
170 30JAN2019 04MAR2019 0 0 0 0
313 03MAR2016 10MAR2016 1 0 1 1
313 03MAR2016 10MAR2016 0 1 0 0
313 12DEC2019 15DEC2019 1 0 1 0
215 22DEC2014 25DEC2014 1 1 1 1
;
What did you mean by:
Variable1 = 1 always at the first replica of dates.
Is this a statement of how the original variable was defined? Or is it a statement of how you want the new variable defined?
What do you mean by replicates? Are you talking about grouping by ID and ADMISSION? Or ID, ADMISSION and DISCHARGE? In your example all of the cases with duplicate ADMISSION dates also have matching DISCHARGE dates. Is that always going to be the case?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@NewUsrStat wrote:
So: "Variable1 =1 always at the first..." is a statement. Replicated rows are identical: for ID, Admission and Discharge.
Still do not know what that means, but since your example is not changing the value of VARIABLE1 perhaps we can just ignore it?
If we assume that the source data has a maximum of one observation per group that has VARIABLE1 set to 1. And if we assume that your goal is to modify the values of VARIABLE2 only in the groups that ever have VARIABLE2 set to 1 then perhaps this SQL query is what you want?
proc sql;
create table db1 as
select id ,admission ,discharge ,variable1
, case when (max(variable2)) then variable1
else variable2
end as variable2
from db
group by id, admission, discharge
order by id, admission, discharge, variable1 desc
;
quit;
The SELECT statement lists the variables to create in the NEW dataset. We copy most of the existing variables as they are. Only VARIABLE2 needs some change. So when vairable2 is ever true for the group then use the values of variable1 (so that it has 1 on the first and zero everywhere else) otherwise leave variable2 as it was.
Make sure to order the repeating observations by descending value of VARIABLE1.
Results:
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi hope this helps,
data DB;
input ID :$20. Admission :date09. Discharge :date09. Variable1 Variable2;
cards;
166 16FEB2019 26FEB2019 1 0
170 22FEB2017 07MAR2017 1 0
170 22FEB2017 07MAR2017 0 1
170 30JAN2019 04MAR2019 0 0
313 03MAR2016 10MAR2016 1 0
313 03MAR2016 10MAR2016 0 1
313 12DEC2019 15DEC2019 1 0
215 22DEC2014 25DEC2014 1 1
;
run;
proc sort;
by ID Admission Discharge;
run;
/* Step 1: Create a temporary dataset with the maximum value of Variable2 for each combination of ID, Admission, and Discharge */
proc sql;
create table Temp as
select ID, Admission, Discharge, max(Variable2) as MaxVar2
from DB
group by ID, Admission, Discharge;
quit;
/* Step 2: Merge the temporary dataset back with the original dataset to update Variable2 */
data DB1;
merge DB(in=a drop=variable2) Temp(in=b);
by ID Admission Discharge;
if a;
if Variable1 = 1 then Variable2 = MaxVar2;
if cmiss(variable2)=1 then variable2=0;
drop MaxVar2;
run;