SUBJECT ID | NEWID | ADVERSE EVENT | AEID | LINKID |
100 | Dyspnoea | 1 | ||
100 | 1 | Dyspnoea | 2 | 2 |
100 | Dyspnoea | 7 | 7 |
Struggled to think of a subject topic to correctly describe what I want to occur; so above is a small sample of a data set I am currently working with.
I would like to know it it's possible that when NEWID is equal to AEID (so the first and second observation) that the missing LINKID is populated with the LINKID where the NEWID is populated. So would be expecting that the missing LINKID would be populated with '2'.
While the above just show's one ADVERSE EVENT, I need this to be correctly be applied to different ADVERSE EVENTS where the numbers might be the same.
No code to provide as I am currently racking my brains as to how to get this to work!
Thanks in advance.
This short example is hard to generalize into something that can be coded, and that correctly answers the problem.
Is it always the case that NEWID is populated in the record immediately after AEID has the same value, or can the NEWID of 1 be in any record in the data set?
Can there be more than one record for which NEWID matches an AEID?
SUBJECT ID | NEWID | ADVERSE EVENT | AEID | LINKID |
101 | 7 | Rash maculo-papular | 5 | 5 |
101 | 12 | Rash maculo-papular | 7 | |
101 | 15 | Rash maculo-papular | 12 | |
101 | Rash maculo-papular | 15 | ||
101 | 19 | Rash maculo-papular | 18 | 18 |
101 | Rash maculo-papular | 19 |
So I've added a further example - within a certain ADVERSE EVENT, when the ADVERSE EVENT is linked the NEWID will always match up to AEID, generally it can fall on any record.
As you can see above the LINKID which I want to display on the linked ADVERSE EVENT is the first value of AEID within the linked ADVERSE EVENT, so I was just using the retain function to populate the missing LINKID's
However, as per the example in the first post, the retain function would be retaining a value from a previous (different ADVERSE EVENT) which is giving me incorrect results.
So I've added a further example - within a certain ADVERSE EVENT, when the ADVERSE EVENT is linked the NEWID will always match up to AEID, generally it can fall on any record.
Now you have given us two examples where this is not true, can we have an example where NEWID and AEID are not in consecutive records? Can you please provide the data following these instructions so that we may write a program that works on your actual sample data?
I will have to attach an excel file, due to working off a company laptop the code that has been provided to convert to the dataset to SAS code will not work.
The attached excel file, presents all the records in the dataset. Which you should be able to import into SAS.
So... when NEWID is present (for the very first ADVERSE EVENT of a set of linked adverse events) the LINKID will equal the AEID.
The above has already been created in the dataset.
Then within the same linked adverse events, when NEWID is equal to the AEID, the LINKID needs to equal the LINKID of the first ADVERSE EVENT.
Some of us will not (or cannot) download Microsoft Office files, as they are a security threat.
So, to repeat my earlier request:
Can you please provide the data following these instructions so that we may write a program that works on your actual sample data?
It can be done in SAS/SQL with a correlated subquery :
data have;
input ID NEWID EVENT :$32. AEID;
datalines;
101 7 Rash_maculo-papular 5
101 12 Rash_maculo-papular 7
101 15 Rash_maculo-papular 12
101 . Rash_maculo-papular 15
101 19 Rash_maculo-papular 18
101 . Rash_maculo-papular 19
;
proc sql;
select
*,
case when NEWID in (select AEID from have where ID=a.ID and EVENT=a.EVENT) then AEID
else . end as LINKID
from have as a;
quit;
ID NEWID EVENT AEID LINKID ------------------------------------------------------------------------ 101 7 Rash_maculo-papular 5 5 101 12 Rash_maculo-papular 7 7 101 15 Rash_maculo-papular 12 12 101 . Rash_maculo-papular 15 . 101 19 Rash_maculo-papular 18 18 101 . Rash_maculo-papular 19 .
Thank you for the above response 🙂
However, I would expect it to look like follows (I have also added how it would look like the for EVENT mentioned at the beginning of this thread):
ID NEWID EVENT AEID LINKID ------------------------------------------------------------------------ 101 7 Rash_maculo-papular 5 5 101 12 Rash_maculo-papular 7 5 101 15 Rash_maculo-papular 12 5 101 . Rash_maculo-papular 15 5 101 19 Rash_maculo-papular 18 18 101 . Rash_maculo-papular 19 18
102 . Dyspnoea 1 2
102 1 Dyspnoea 2 2
So for ID 101 when NEWID = 7 and EVENT = Rash_maculo-papular this is classed as the first occurence of the EVENT within the linked EVENTS - so LINKID is populated with 5 already.
And then for ID 101 when NEWID = 19 and EVENT = Rash_maculo-papular this is classed as the first occurence of the EVENT within the linked EVENTS - so LINKID is populated with 18 already.
So then when NEWID = AEID I'd like LINKID to be populated with the LINKID from the first occurence (as shown above)
This was done fine in the code I was using, when using a retain function. However, the second example I have given causes problems as I cannot use a retain function on that.
For ID 102 when NEWID = 1 and EVENT = DYSPNOEA this is classed as the first occurence of the EVENT, so LINKID is populated with 2 already for that observation. So I'm wanting the record above it, where the NEWID = AEID to be populated with the LINKID from the first occurence (which would be 2).
So I'm hopefully looking for a solution with the outcome of the table above.
I really appreciate the help so far!, it's just a really awkward one to try to explain, so hoping the above can clear it up abit 🙂
It is one to one linked or one to many linked?
The following could give you a start. Note : the code only suited for ONE id and ONE Event.
data have(keep=newid aeid rename=(newid=_end aeid=_start));
input ID NEWID EVENT :$32. AEID;
datalines;
101 7 Rash_maculo-papular 5
101 12 Rash_maculo-papular 7
101 15 Rash_maculo-papular 12
101 . Rash_maculo-papular 15
101 19 Rash_maculo-papular 18
101 . Rash_maculo-papular 19
;
proc sql;
create table ancestor as
select * from have
where _start not in (select _end from have);
quit;
data want(keep=path);
if _n_ eq 1 then do;
length path _path $ 700 ;
if 0 then set have;
declare hash ha(hashexp:20,dataset:'have(where=(_start is not missing and _end is not missing))',multidata:'y');
ha.definekey('_start');
ha.definedata('_end');
ha.definedone();
declare hash pa(ordered:'y');
declare hiter hi_path('pa');
pa.definekey('n');
pa.definedata('n','path');
pa.definedone();
end;
set ancestor;
count=1;n=1;_n=1;
path=catx('|',_start,_end);
*putlog 'WARNING:Found ' _end;
pa.add();
do while(hi_path.next()=0);
if n ne 1 then pa.remove(key:_n);_n=n;
_path=path;
_start=scan(path,-1,'|');
rc=ha.find(); if rc ne 0 then output;
do while(rc=0);
if not findw(path,strip(_end),'|') then do;
if length(path)+length(_end)+1 gt lengthc(path) then do;
putlog 'ERROR: The length of path and _path are set too short';
stop;
end;
*putlog 'WARNING:Found ' _end;
count+1;n=count;
path=catx('|',path,_end);
pa.add(); *output;
path=_path;
end;
else output; /* It is a circle.*/
rc=ha.find_next();
end;
end;
pa.clear();
run;
Many thanks for all your responses, been a bit busy so only had chance to respond.
So, I managed to solve my issue with the following set statements:
data want;
set have;
set have(firstobs=2 keep=newid aeid linkid rename=(newid=x1 aeid=y1 linkid=z1))
set have(obs=1 drop=_all_);
if x1=aeid then linkid=z1;
run;
So for the rash maculo papular events I kept my retain, and then used the above to correct the population of the LINKID when NEWID doesn't run chronologically.
Happy to consider this closed now 🙂
Thanks alot.
You didn't answer my question.
It is one to one matched or one to many matched?
After running my code ,get WANT dataset ,there is variable PATH like:
5|7|12 ......
The first part 5 is what aeid you want. 7 12 is the linked id with 5 .
Merge it back to your dataset and get you final need.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.