adding observations according to variable values

kalbo · Posted 02-26-2025 07:25 AM

hi, I have a situation where ADSL variables are selectively added to a list. In this example CORE1 should be replaced by a list of variables present in ADSL that have a coregroup of 1. CORE2 should be replaced by a list of variables in ADSL with coregroup of 2 and so on. I would like to automate this addition of variables.

HAVE:

dataset variable label coregroup

ADAE USUBJID Subject ID

ADAE :CORE1

ADAE :CORE2

ADAE TRT Treatment

ADMH USUBJID Subject ID

ADMH MHTERM Medical History Term

ADMH TRT Treatment Group

ADSL USUBJID Subject ID

ADSL RAND Rand flag 1

ADSL ETHNEW New Ethnicity 2

ADSL SITEID Site ID

WANT:

dataset variable label coregroup

ADAE USUBJID Subject ID

ADAE RAND Rand flag 1

ADAE ETHNEW New Ethnicity 2

ADAE TRT Treatment

ADMH USUBJID Subject ID

ADMH MHTERM Medical History Term

ADMH TRT Treatment Group

ADSL USUBJID Subject ID

ADSL RAND Rand flag 1

ADSL ETHNEW New Ethnicity 2

ADSL SITEID Site ID

andreas_lds · Posted 02-26-2025 07:31 AM

Please post data in usable form, making it easier to understand the problem and giving us something to work with.

The output statement can be used to create additional obs.

kalbo · Posted 02-26-2025 08:33 AM

data have;
	infile cards dsd;
	input dataset $ variable : $10. label : $20. coregroup $;
	cards;
ADAE,USUBJID,Subject ID,
ADAE,:CORE1,,
ADAE,:CORE2,,
ADAE,TRT,Treatment,
ADMH,USUBJID,Subject ID,
ADMH,MHTERM,Medical History Term,
ADMH,TRT,Treatment Group,
ADSL,USUBJID,Subject ID,
ADSL,RAND,Rand flag,1,
ADSL,ETHNEW,New Ethnicity,2,
ADSL,SITEID,Site ID,1,
;

kalbo · Posted 02-26-2025 08:40 AM

For example to process CORE1, we need to find all records in ADSL with coregroup=1. Then add these 2 records in to replace CORE1.

ADAE RAND Rand flag 1

ADAE SITEID Site ID 1

Then move to CORE2 and add in the following records to replace CORE2:

ADAE ETHNEW New Ethnicity 2

The ADSL records can be added to any data set in the list which will contain all data sets. The list isnt so big.

WANT:

dataset variable label coregroup

ADAE USUBJID Subject ID

ADAE RAND Rand flag 1

ADAE SITEID Site ID 1

ADAE ETHNEW New Ethnicity 2

ADAE TRT Treatment

ADMH USUBJID Subject ID

ADMH MHTERM Medical History Term

ADMH TRT Treatment Group

ADSL USUBJID Subject ID

ADSL RAND Rand flag 1

ADSL ETHNEW New Ethnicity 2

ADSL SITEID Site ID 1

PaigeMiller · Posted 02-26-2025 07:59 AM

In addition to the request from @andreas_lds , please tell us if this is the entire data set, or if this is just a small part of a much larger data set. (If it is much larger, please tell us how many rows). Are there different subjects or different IDs in this larger data set? Would the code need to be smart enough to make different replacements for different subjects/different IDs?

--
Paige Miller

kalbo · Posted 02-26-2025 08:42 AM

The example I gave is a part of a data set (which is the metadata of adam data sets) which is about 1000 rows. I would need to replace any CORE value with observations from the ADSL. Can be up to 4 core groupings. So whenever CORE1, 2, 3 etc appears it should also work for those.

PaigeMiller · Posted 02-26-2025 08:51 AM

I asked "Are there different subjects or different IDs in this larger data set? Would the code need to be smart enough to make different replacements for different subjects/different IDs?" I don't see an answer.

--
Paige Miller

kalbo · Posted 02-26-2025 08:54 AM

hi, yes there would be other data sets listed. The id is the coregroup of 1,2,3 or 4 (for ADSL) which corresponds to CORE1, 2, 4, 5 for all other data sets.

kalbo · Posted 02-26-2025 08:59 AM

yes it should be smart enough to handle that.

Patrick · Posted 02-26-2025 06:44 PM

@kalbo Your descriptions are somewhat confusing. May be that's because of formulations like:

>CORE1 should be replaced by a list of variables
That's not a list of variables. You want to replace records with specific values for variables dataset and variable with values from some other records in the same table. What increases the confusion are the variable names "variable" and "dataset".

>The ADSL records can be added to any data set in the list

What's confusing here: In the SAS world the term "data set" normally means "table" and not row/observation/record.

Based on your WANT data I believe what you're after is something like below:

data have;
  infile cards dsd truncover;
  input dataset $ variable : $10. label : $20. coregroup $;
  cards;
ADAE,USUBJID,Subject_ID,
ADAE,:CORE1,,
ADAE,:CORE2,,
ADAE,TRT,Treatment,
ADMH,USUBJID,Subject_ID,
ADMH,MHTERM,Medical_History_Term,
ADMH,TRT,Treatment_Group,
ADSL,USUBJID,Subject_ID,
ADSL,RAND,Rand_flag,1,
ADSL,ETHNEW,New_Ethnicity,2,
ADSL,SITEID,Site_ID,1,
;

data want;
   input dataset $ variable : $10. label : $20. coregroup $;
   datalines;
ADAE USUBJID Subject_ID .
ADAE RAND Rand_flag 1
ADAE SITEID Site_ID 1
ADAE ETHNEW New_Ethnicity 2
ADAE TRT Treatment .
ADMH USUBJID Subject_ID .
ADMH MHTERM Medical_History_Term .
ADMH TRT Treatment_Group .
ADSL USUBJID Subject_ID .
ADSL RAND Rand_flag 1
ADSL ETHNEW New_Ethnicity 2
ADSL SITEID Site_ID 1
;
run;

data want_derived;
  if _n_=1 then
    do;
      dcl hash h1(dataset:'have(where=(dataset="ADSL" and not missing(coregroup))', multidata:'y');
      h1.defineKey('coregroup');
      h1.defineData('variable','label','coregroup');
      h1.defineDone();
    end;
  set have;
  if dataset='ADAE' and find(variable,'core','i')>0 then
    do;
      do while(h1.do_over(key:strip(scan(variable,-1,,'kd'))) eq 0);
        output;
      end;
    end;
  else
    output;
run;

proc compare data=want compare=want_derived;
run;
proc print data=want;run;
proc print data=want_derived;run;

PaigeMiller · Posted 02-26-2025 09:27 AM

@kalbo wrote:

hi, yes there would be other data sets listed. The id is the coregroup of 1,2,3 or 4 (for ADSL) which corresponds to CORE1, 2, 4, 5 for all other data sets.

Not sure I understand this. It would help to see other data sets. Could you please provide sample data which covers at least two data sets?

Lots of words have been written, I have to scroll up and down to get a complete picture. Can you rewrite the question to address all of these concerns in one post?

--
Paige Miller

Tom · Posted 02-26-2025 10:47 PM

You are not explaining your problem very clearly.

Let's see if we can guess what you are asking to do.

I suspect that you are trying to automate some part of the process of generating metadata about clinical trial data from your use of dataset names like ADAE and ADSL

So you seem to have a dataset named HAVE which has incomplete metadata about the variables in the datasets used (generated for?) your clinical trial.

Spoiler

data have;
	infile cards dsd truncover ;
	input dataset :$32. variable :$32. label :$256. coregroup $;
cards;
ADAE,USUBJID,Subject ID,
ADAE,:CORE1,,
ADAE,:CORE2,,
ADAE,TRT,Treatment,
ADMH,USUBJID,Subject ID,
ADMH,MHTERM,Medical History Term,
ADMH,TRT,Treatment Group,
ADSL,USUBJID,Subject ID,
ADSL,RAND,Rand flag,1,
ADSL,ETHNEW,New Ethnicity,2,
ADSL,SITEID,Site ID,1,
;

data have; infile cards dsd truncover ; input dataset :$32. variable :$32. label :$256. coregroup $; cards; ADAE,USUBJID,Subject ID, ADAE,:CORE1,, ADAE,:CORE2,, ADAE,TRT,Treatment, ADMH,USUBJID,Subject ID, ADMH,MHTERM,Medical History Term, ADMH,TRT,Treatment Group, ADSL,USUBJID,Subject ID, ADSL,RAND,Rand flag,1, ADSL,ETHNEW,New Ethnicity,2, ADSL,SITEID,Site ID,1, ;

The source of this incompleteness is the use of special values like ':CORE1' and 'CORE2' as PLACEHOLDERS for an actual set of variable names that should be inserted into the metadata table.

So you should be able to do some simple joins to get what you want.

proc sql;
create table want as
select  dataset,variable,label
  from have
  where not (variable eqt ':CORE')
union
select a.dataset,b.variable,b.label
  from have a
  inner join have b
  on a.variable eqt ':CORE'
    and b.dataset='ADSL' and b.coregroup=substr(a.variable,6)
order by dataset,variable
;
quit;

adding observations according to variable values

Re: adding observations according to variable values

Re: adding observations according to variable values

Re: adding observations according to variable values

Re: adding observations according to variable values

Re: adding observations according to variable values

Re: adding observations according to variable values

Re: adding observations according to variable values

Re: adding observations according to variable values

Re: adding observations according to variable values

Re: adding observations according to variable values

Re: adding observations according to variable values

Registration is open

SAS Training: Just a Click Away