hi, I have a situation where ADSL variables are selectively added to a list. In this example CORE1 should be replaced by a list of variables present in ADSL that have a coregroup of 1. CORE2 should be replaced by a list of variables in ADSL with coregroup of 2 and so on. I would like to automate this addition of variables.
HAVE:
dataset variable label coregroup
ADAE USUBJID Subject ID
ADAE :CORE1
ADAE :CORE2
ADAE TRT Treatment
ADMH USUBJID Subject ID
ADMH MHTERM Medical History Term
ADMH TRT Treatment Group
ADSL USUBJID Subject ID
ADSL RAND Rand flag 1
ADSL ETHNEW New Ethnicity 2
ADSL SITEID Site ID
WANT:
dataset variable label coregroup
ADAE USUBJID Subject ID
ADAE RAND Rand flag 1
ADAE ETHNEW New Ethnicity 2
ADAE TRT Treatment
ADMH USUBJID Subject ID
ADMH MHTERM Medical History Term
ADMH TRT Treatment Group
ADSL USUBJID Subject ID
ADSL RAND Rand flag 1
ADSL ETHNEW New Ethnicity 2
ADSL SITEID Site ID
Please post data in usable form, making it easier to understand the problem and giving us something to work with.
The output statement can be used to create additional obs.
data have;
infile cards dsd;
input dataset $ variable : $10. label : $20. coregroup $;
cards;
ADAE,USUBJID,Subject ID,
ADAE,:CORE1,,
ADAE,:CORE2,,
ADAE,TRT,Treatment,
ADMH,USUBJID,Subject ID,
ADMH,MHTERM,Medical History Term,
ADMH,TRT,Treatment Group,
ADSL,USUBJID,Subject ID,
ADSL,RAND,Rand flag,1,
ADSL,ETHNEW,New Ethnicity,2,
ADSL,SITEID,Site ID,1,
;
For example to process CORE1, we need to find all records in ADSL with coregroup=1. Then add these 2 records in to replace CORE1.
ADAE RAND Rand flag 1
ADAE SITEID Site ID 1
Then move to CORE2 and add in the following records to replace CORE2:
ADAE ETHNEW New Ethnicity 2
The ADSL records can be added to any data set in the list which will contain all data sets. The list isnt so big.
WANT:
dataset variable label coregroup
ADAE USUBJID Subject ID
ADAE RAND Rand flag 1
ADAE SITEID Site ID 1
ADAE ETHNEW New Ethnicity 2
ADAE TRT Treatment
ADMH USUBJID Subject ID
ADMH MHTERM Medical History Term
ADMH TRT Treatment Group
ADSL USUBJID Subject ID
ADSL RAND Rand flag 1
ADSL ETHNEW New Ethnicity 2
ADSL SITEID Site ID 1
In addition to the request from @andreas_lds , please tell us if this is the entire data set, or if this is just a small part of a much larger data set. (If it is much larger, please tell us how many rows). Are there different subjects or different IDs in this larger data set? Would the code need to be smart enough to make different replacements for different subjects/different IDs?
The example I gave is a part of a data set (which is the metadata of adam data sets) which is about 1000 rows. I would need to replace any CORE value with observations from the ADSL. Can be up to 4 core groupings. So whenever CORE1, 2, 3 etc appears it should also work for those.
I asked "Are there different subjects or different IDs in this larger data set? Would the code need to be smart enough to make different replacements for different subjects/different IDs?" I don't see an answer.
hi, yes there would be other data sets listed. The id is the coregroup of 1,2,3 or 4 (for ADSL) which corresponds to CORE1, 2, 4, 5 for all other data sets.
yes it should be smart enough to handle that.
@kalbo Your descriptions are somewhat confusing. May be that's because of formulations like:
>CORE1 should be replaced by a list of variables
That's not a list of variables. You want to replace records with specific values for variables dataset and variable with values from some other records in the same table. What increases the confusion are the variable names "variable" and "dataset".
>The ADSL records can be added to any data set in the list
What's confusing here: In the SAS world the term "data set" normally means "table" and not row/observation/record.
Based on your WANT data I believe what you're after is something like below:
data have;
infile cards dsd truncover;
input dataset $ variable : $10. label : $20. coregroup $;
cards;
ADAE,USUBJID,Subject_ID,
ADAE,:CORE1,,
ADAE,:CORE2,,
ADAE,TRT,Treatment,
ADMH,USUBJID,Subject_ID,
ADMH,MHTERM,Medical_History_Term,
ADMH,TRT,Treatment_Group,
ADSL,USUBJID,Subject_ID,
ADSL,RAND,Rand_flag,1,
ADSL,ETHNEW,New_Ethnicity,2,
ADSL,SITEID,Site_ID,1,
;
data want;
input dataset $ variable : $10. label : $20. coregroup $;
datalines;
ADAE USUBJID Subject_ID .
ADAE RAND Rand_flag 1
ADAE SITEID Site_ID 1
ADAE ETHNEW New_Ethnicity 2
ADAE TRT Treatment .
ADMH USUBJID Subject_ID .
ADMH MHTERM Medical_History_Term .
ADMH TRT Treatment_Group .
ADSL USUBJID Subject_ID .
ADSL RAND Rand_flag 1
ADSL ETHNEW New_Ethnicity 2
ADSL SITEID Site_ID 1
;
run;
data want_derived;
if _n_=1 then
do;
dcl hash h1(dataset:'have(where=(dataset="ADSL" and not missing(coregroup))', multidata:'y');
h1.defineKey('coregroup');
h1.defineData('variable','label','coregroup');
h1.defineDone();
end;
set have;
if dataset='ADAE' and find(variable,'core','i')>0 then
do;
do while(h1.do_over(key:strip(scan(variable,-1,,'kd'))) eq 0);
output;
end;
end;
else
output;
run;
proc compare data=want compare=want_derived;
run;
proc print data=want;run;
proc print data=want_derived;run;
@kalbo wrote:
hi, yes there would be other data sets listed. The id is the coregroup of 1,2,3 or 4 (for ADSL) which corresponds to CORE1, 2, 4, 5 for all other data sets.
Not sure I understand this. It would help to see other data sets. Could you please provide sample data which covers at least two data sets?
Lots of words have been written, I have to scroll up and down to get a complete picture. Can you rewrite the question to address all of these concerns in one post?
You are not explaining your problem very clearly.
Let's see if we can guess what you are asking to do.
I suspect that you are trying to automate some part of the process of generating metadata about clinical trial data from your use of dataset names like ADAE and ADSL
So you seem to have a dataset named HAVE which has incomplete metadata about the variables in the datasets used (generated for?) your clinical trial.
data have;
infile cards dsd truncover ;
input dataset :$32. variable :$32. label :$256. coregroup $;
cards;
ADAE,USUBJID,Subject ID,
ADAE,:CORE1,,
ADAE,:CORE2,,
ADAE,TRT,Treatment,
ADMH,USUBJID,Subject ID,
ADMH,MHTERM,Medical History Term,
ADMH,TRT,Treatment Group,
ADSL,USUBJID,Subject ID,
ADSL,RAND,Rand flag,1,
ADSL,ETHNEW,New Ethnicity,2,
ADSL,SITEID,Site ID,1,
;
The source of this incompleteness is the use of special values like ':CORE1' and 'CORE2' as PLACEHOLDERS for an actual set of variable names that should be inserted into the metadata table.
So you should be able to do some simple joins to get what you want.
proc sql;
create table want as
select dataset,variable,label
from have
where not (variable eqt ':CORE')
union
select a.dataset,b.variable,b.label
from have a
inner join have b
on a.variable eqt ':CORE'
and b.dataset='ADSL' and b.coregroup=substr(a.variable,6)
order by dataset,variable
;
quit;
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.