Solved: Modify master dataset with transaction data conditional on the master

JennieWalker · Posted 11-03-2023 09:50 PM

Hello

I have a question on how to duplicate data for subjects where data is missing on certain variables. I have a dataset where some subjects have 2 rows. Most information is unique and populated but some fields such as sex and weight may have missing data for the second record (see table 1). There is no pattern to the missingness with some subjects having sex and/or weight already populated. I am trying to replace the missing values with available values for these subjects. However, if a value exists, I would like to keep the populated values (see table 2).

Table 1. Data with missingness

SUBJID	AGE	SEX	WEIGHT	COHORT
1	37	F	65.5	1
1				2
2	40	M	70	1
2	41	M		2
3		M	80.5	1
4	32	M	75.3	1
4	33		75.2	2
5	50	F	68	2
6	45	F	75.5	1
7	30	M	68.5	1
7				2

Table 2. Final table with data updated in red.

SUBJID	AGE	SEX	WEIGHT	COHORT
1	37	F	65.5	1
1	37	F	65.5	2
2	40	M	70	1
2	41	M	70	2
3		M	80.5	1
4	32	M	75.3	1
4	32	M	75.2	2
5	50	F	68	2
6	45	F	75.5	1
7	30	M	68.5	1
7	30	M	68.5	2

I have tried to use modify (see below) – which is fine where the duplicate data is the same in the transaction dataset as the Master, but for subjects 2 (age) and 4 (weight) where there is a different value, the values in the transaction dataset update the Master. I have done some reading and it appears a condition is easy when related to the transaction dataset, but I would like the condition to apply to the Master - if data is missing in Master then replace with transaction otherwise keep value in master.

Other options are welcome.
Thanks in advance for your time!

data master;
input SUBJID $ AGE SEX $ WEIGHT COHORT;
datalines;
1 37 F 65.5 1
1 . . . 2
2 40 M 70 1
2 41 M . 2
3 . M 80.5 1
4 32 M 75.3 1
4 33 . 75.2 2
5 50 F 68 2
6 45 F 75.5 1
7 30 M 68.5 1
7 . . . 2
;
run;

data trans;
input SUBJID $ AGE SEX $ WEIGHT COHORT;
datalines;
1 37 F 65.5 1
2 40 M 70 1
3 . M 80.5 1
4 32 M 75.3 1
6 45 F 75.5 1
7 30 M 68.5 1
;
run;

data trans;
  set master;
  keep subjid age sex weight;
  where COHORT=1;
run;

data master;
  modify master trans;
  by subjid;
run;

Tom · Posted 11-04-2023 08:27 AM

The UPDATE statement is for apply transactions to an ORIGINAL dataset. When the value in the transaction is MISSING the original value is not replaced.

It is designed to have one observation per BY group in the ORIGINAL dataset.

It is designed to output one observation per BY group

But you can use your single dataset as both the ORIGINAL and the TRANSACTIONS. Use the OBS=0 dataset option to start with an empty ORIGINAL dataset and use the whole dataset as transactions.

To prevent the collapse to a single observation per BY group add an explicit OUTPUT statement.

data want;
  update original(obs=0) original;
  by subjid;
  output;
run;

If there are some variables that did not want to have the values carried forward then you can add a SET statement to re-read those variables.

If there are some cases where the missing values are on the FIRST observation for the group you might need to do more work.

For your example it looks like you want to collapse all of the variables expect COHORT to be the same on every observation for that subject. In which case you might want to first collapse to unique observations and then merge back to get the muliple observations with different values of COHORT.

data want;
  update original(obs=0) original;
  by subjid;
run;

data want;
   merge want(drop=cohort) orgiinal(keep=subjid cohort);
   by subjid;
run;

View solution in original post

Ksharp · Posted 11-04-2023 05:31 AM

data master;
input SUBJID $ AGE SEX $ WEIGHT COHORT;
datalines;
1 37 F 65.5 1
1 . . . 2
2 40 M 70 1
2 41 M . 2
3 . M 80.5 1
4 32 M 75.3 1
4 33 . 75.2 2
5 50 F 68 2
6 45 F 75.5 1
7 30 M 68.5 1
7 . . . 2
;
run;

data want;
update master(obs=0) master;
by subjid;
output;
run;

JennieWalker · Posted 11-04-2023 06:27 PM

Thanks heaps for the suggestion. This is a perfect solution to the question I posted, but I have other variables in the dataset so I do require the additional merge step shown below.

Thanks again

Tom · Posted 11-04-2023 08:27 AM

The UPDATE statement is for apply transactions to an ORIGINAL dataset. When the value in the transaction is MISSING the original value is not replaced.

It is designed to have one observation per BY group in the ORIGINAL dataset.

It is designed to output one observation per BY group

But you can use your single dataset as both the ORIGINAL and the TRANSACTIONS. Use the OBS=0 dataset option to start with an empty ORIGINAL dataset and use the whole dataset as transactions.

To prevent the collapse to a single observation per BY group add an explicit OUTPUT statement.

data want;
  update original(obs=0) original;
  by subjid;
  output;
run;

If there are some variables that did not want to have the values carried forward then you can add a SET statement to re-read those variables.

If there are some cases where the missing values are on the FIRST observation for the group you might need to do more work.

For your example it looks like you want to collapse all of the variables expect COHORT to be the same on every observation for that subject. In which case you might want to first collapse to unique observations and then merge back to get the muliple observations with different values of COHORT.

data want;
  update original(obs=0) original;
  by subjid;
run;

data want;
   merge want(drop=cohort) orgiinal(keep=subjid cohort);
   by subjid;
run;

Modify master dataset with transaction data conditional on the master

Re: Modify master dataset with transaction data conditional on the master

Re: Modify master dataset with transaction data conditional on the master

Re: Modify master dataset with transaction data conditional on the master

Re: Modify master dataset with transaction data conditional on the master

Modify master dataset with transaction data conditional on the master

Re: Modify master dataset with transaction data conditional on the master

Re: Modify master dataset with transaction data conditional on the master

Re: Modify master dataset with transaction data conditional on the master

Re: Modify master dataset with transaction data conditional on the master

SAS Innovate 2025: Register Now

SAS Training: Just a Click Away