Hi everyone
I like to create dataset C from two dataset A (time varying exposure) and B (time-varying outcome) (please see attached).
Question: Which SAS codes (program) can I use to have dataset C from two dataset A and B
Something like this, maybe?
data a;
infile datalines dsd;
INPUT PatID Days Treatment;
label patid='Patient ID' Days= 'Days from diagnosis' ;
datalines;
1,0,0
1,100,1
1,150,1
1,200,1
1,250,0
;
title "A";
proc print; run;
data b;
infile datalines dsd;
INPUT PatID Days Hospitalized;
label patid='Patient ID' Days= 'Days from diagnosis' ;
datalines;
1,0,0
1,100,0
1,180,1
1,220,1
;
title "B";
proc print; run;
data C;
merge a b;
by PatID days;
/* If new Hospitalized between Treatment dates, use previous treatment */
Treatment=coalesce(Treatment,lag1(Treatment));
/* If new Treatment between Hospitalized dates, use 0 */
Hospitalized=coalesce(Hospitalized,0);
run;
title "C";
proc print; run;
Something like this, maybe?
data a;
infile datalines dsd;
INPUT PatID Days Treatment;
label patid='Patient ID' Days= 'Days from diagnosis' ;
datalines;
1,0,0
1,100,1
1,150,1
1,200,1
1,250,0
;
title "A";
proc print; run;
data b;
infile datalines dsd;
INPUT PatID Days Hospitalized;
label patid='Patient ID' Days= 'Days from diagnosis' ;
datalines;
1,0,0
1,100,0
1,180,1
1,220,1
;
title "B";
proc print; run;
data C;
merge a b;
by PatID days;
/* If new Hospitalized between Treatment dates, use previous treatment */
Treatment=coalesce(Treatment,lag1(Treatment));
/* If new Treatment between Hospitalized dates, use 0 */
Hospitalized=coalesce(Hospitalized,0);
run;
title "C";
proc print; run;
Hi everyone
I have two medication files with start and stop in each. I want to merge them somehow to cover time interval that patient have taken both medA and medB. Please see attached file: dataset A and B and dataset I want to create using SAS code.
Many users here don't want to download Office files because of virus potential, others have such things blocked by security software. Also if you give us office files we have to create a SAS data set and due to the non-existent constraints on data cells the result we end up with may not have variables of the same type (numeric or character) and even values.
Please provide example in the form of working data step code. You can convert your SAS data set using Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the </> icon or attached as text to show exactly what you have and that we can test code against.
Best is include a similar example of the expected output given the example input data sets.
Thanks for the note.
Dataset A
studyid | Start | stop | medA |
1 | 83 | 948 | 1 |
1 | 1042 | 1322 | 1 |
1 | 2792 | 5457 | 1 |
2 | 0 | 200 | 1 |
Dataset B
studyid | Start | stop | MedB |
1 | 0 | 101 | 1 |
3 | 50 | 150 | 1 |
|
|
|
|
Dataset I want to create
studyid | Start | stop | MedB | MedA |
1 | 0 | 83 | 1 | 1 |
1 | 83 | 101 | 1 | 1 |
1 | 101 | 948 | 0 | 1 |
1 | 948 | 1042 | 0 | 1 |
1 | 1042 | 1322 | 0 | 1 |
1 | 2792 | 5457 | 0 | 1 |
2 | 0 | 200 | 0 | 1 |
3 | 50 | 150 | 1 | 0 |
Not working data steps...
I am very sure I do not understand why you want
Dataset I want to create
studyid |
Start |
stop |
MedB |
MedA |
1 |
0 |
82 |
1 |
0 |
1 |
83 |
101 |
1 |
1 |
1 |
101 |
948 |
0 |
1 |
1 |
948 |
1042 |
0 |
1 |
1 |
1042 |
1322 |
0 |
1 |
1 |
2792 |
5457 |
0 |
1 |
2 |
0 |
200 |
0 |
1 |
3 |
50 |
150 |
1 |
0 |
You said "cover time interval that patient have taken both medA and medB". Assuming 1 for MedA or MedB means "taking med", then I would think it makes more sense to have where the endpoints do not imply something taken for the entire interval what not indicated in the starting data sets. Like this
Placing a 1 in MedA for the first line would be saying that MedA was taken for the entire interval.
studyid |
Start |
stop |
MedB |
MedA |
1 |
0 |
82 |
1 |
0 |
1 |
83 |
101 |
1 |
1 |
1 |
102 |
948 |
0 |
1 |
1 |
949 |
1042 |
0 |
0 |
1 |
1043 |
1322 |
0 |
1 |
1 |
2792 |
5457 |
0 |
1 |
2 |
0 |
200 |
0 |
1 |
3 |
50 |
150 |
1 |
0 |
And possibly even include an interval:
1 |
1323 |
2791 |
0 |
0 |
One might also wonder if there should be actual dates somewhere if this is going to be compared with other data that has dates.
Thanks Ballardw
for your reply. I agree with your modification and explanation. having considered your modification, I wonder which SAS codes I can use to create this new dataset
Regards
As Balladaw said there are some questions you need to answer.
data A;
input
studyid
Start
stop
medA
;
cards;
1
83
948
1
1
1042
1322
1
1
2792
5457
1
2
0
200
1
;
data B;
input
studyid
Start
stop
MedB
;
cards;
1
0
101
1
3
50
150
1
;
data temp_a(index=(x=(studyid date)));
set a;
n_a+1;
do date=start to stop;
output;
end;
drop start stop;
run;
data temp_b(index=(x=(studyid date)));
set b;
n_b+1;
do date=start to stop;
output;
end;
drop start stop;
run;
data temp;
merge temp_a temp_b;
by studyid date;
run;
data temp2;
set temp;
if studyid ne lag(studyid) or n_a ne lag(n_a) or n_b ne lag(n_b) then group+1;
run;
proc sql;
create table want as
select group,studyid,min(date) as start,max(date) as stop,max(meda) as meda,max(medb) as medb
from temp2
group by group,studyid;
quit;
Thanks for the code. It works
Regards
M
The basic part of your question looks like this should be part of the solution:
data c; set a b ; run; proc sort data=c; by patid daysfromdiagnosis; run;
Some items to address:
For future questions if you have SAS data sets use the names of the variables. "Days from diagnosis" by default is not going to be a valid variable name.
Second: are you sure that you want values that are not in the original data to be 0 or 1 and not missing? We have no context for what the values of Treatment or Hospitalized actually mean in real world terms and I do not like "guessing" what the underlying rules for assigning such values might be.
There could very well be another step after the Proc Sort of the combined data sets.
Note: Merge in SAS terms means a combination from side by side and has some restrictions for most usage, such as sorted by one or more common variables. We do not know that is the real case.
Please provide data in the form of working data sets so we at least know variable types and actual names. If you can't do that provide text pasted into a text box opened with the </> icon above the main message window. We can't code against values buried in pictures.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.