- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi guys,
suppose to have the following dataset:
data DB;
input ID :$20. DATE_OF_BIRTH DATE_OF_DEATH MONTH REASON :$20.;
cards;
0001 1998 2022 . .
0001 . 2022 1 ILLNESS
0002 1945 2011 . .
0002 . 2011 12 ACCIDENT
0003 1938 1990 . .
0003 . 1990 10 ILLNESS
run;
DB results from DATASET1 and DATASET2 that have been "merged" with set function.
data DATASET1;
input ID :$20. DATE_OF_BIRTH DATE_OF_DEATH ;
cards;
0001 1998 2022
0002 1945 2011
0003 1938 1990
run;
data DATASET2;
input ID :$20. DATE_OF_DEATH MONTH REASON :$20.;
cards;
0001 2022 ILLNESS
0002 2011 ACCIDENT
0003 1990 ILLNESS
run;
Is there a way to get the following?
data DB1;
input ID :$20. DATE_OF_BIRTH DATE_OF_DEATH Month Reason :$20.;
cards;
0001 1998 2022 1 ILLNESS
0001 . 2022 . .
0002 1945 2011 12 ACCIDENT
0002 . 2011 . .
0003 1938 1990 10 ILLNESS
0003 . 1990 . .
run;
In other words I would like to move data of variables Month and Reason at the first row of the ID as shown in DB1. I tried using "first." but without success.
Really thank you in advance
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Why not just MERGE to begin with? I am assuming each of the two datasets have only one observation per ID.
data DATASET1;
input ID :$20. DATE_OF_BIRTH DATE_OF_DEATH ;
cards;
0001 1998 2022
0002 1945 2011
0003 1938 1990
;
data DATASET2;
input ID :$20. DATE_OF_DEATH MONTH REASON :$20.;
cards;
0001 2022 1 ILLNESS
0002 2011 12 ACCIDENT
0003 1990 10 ILLNESS
;
data want;
merge dataset1 dataset2;
by id ;
run;
Result:
DATE_OF_ DATE_OF_ Obs ID BIRTH DEATH MONTH REASON 1 0001 1998 2022 1 ILLNESS 2 0002 1945 2011 12 ACCIDENT 3 0003 1938 1990 10 ILLNESS
If you really do not have access to the original datasets so that all you have is:
data bad;
set dataset1 dataset2;
by id;
run;
Then to convert that dataset you can use the UPDATE statement. Treating an empty version of the dataset as the starting point and the whole dataset as the transactions to be applied.
data want;
update bad(obs=0) bad;
by id;
run;
Which generates the same result as we got when merging.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Sound like a look-ahead problem, for which I would normally look for one of Mark Keintz's papers on lags and leads, e.g.: https://support.sas.com/resources/papers/proceedings17/1277-2017.pdf.
Do you know you will always have exactly two rows per ID?
Do you know Date_Of_Birth will always be missing for the row that has the value for Date_Of_Death?
Is there a reason you want to keep two records per ID in the output dataset?
How did you end up with this weird data? It looks like someone tried to merge data with the date of birth and date with the date of death, but accidentally concatenated them. So it might be better to fix the code that created this data, if that's an option.
Next webinar will be in January 2025. Until then, check out our archives: https://www.basug.org/videos. And be sure to subscribe to our our email list.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thank you very much for reply. The dataset is too big and I don't have the chance to check to answer to your questions. What I know is that two datasets have been merged using "set". Both have in common variables "ID" and "DATE_of_DEATH". I will edit the post showing the single datasets that have been "merged".
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Do you have dataset1 and dataset2? The DATA step has a MERGE statement. It looks like instead of using the SET statement to combine them (which 'stacks' them vertically), you should use the MERGE statement (which merges them, horizontally). With MERGE your output dataset would have three rows, and more variables.
Next webinar will be in January 2025. Until then, check out our archives: https://www.basug.org/videos. And be sure to subscribe to our our email list.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Why not just MERGE to begin with? I am assuming each of the two datasets have only one observation per ID.
data DATASET1;
input ID :$20. DATE_OF_BIRTH DATE_OF_DEATH ;
cards;
0001 1998 2022
0002 1945 2011
0003 1938 1990
;
data DATASET2;
input ID :$20. DATE_OF_DEATH MONTH REASON :$20.;
cards;
0001 2022 1 ILLNESS
0002 2011 12 ACCIDENT
0003 1990 10 ILLNESS
;
data want;
merge dataset1 dataset2;
by id ;
run;
Result:
DATE_OF_ DATE_OF_ Obs ID BIRTH DEATH MONTH REASON 1 0001 1998 2022 1 ILLNESS 2 0002 1945 2011 12 ACCIDENT 3 0003 1938 1990 10 ILLNESS
If you really do not have access to the original datasets so that all you have is:
data bad;
set dataset1 dataset2;
by id;
run;
Then to convert that dataset you can use the UPDATE statement. Treating an empty version of the dataset as the starting point and the whole dataset as the transactions to be applied.
data want;
update bad(obs=0) bad;
by id;
run;
Which generates the same result as we got when merging.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content