Solved: Re: Move data to the desired position in the dataset

NewUsrStat · Posted 07-15-2024 12:11 PM

Hi guys,

suppose to have the following dataset:

data DB;
  input ID :$20. DATE_OF_BIRTH  DATE_OF_DEATH  MONTH  REASON :$20.;
cards;
0001  1998 2022   .     .
0001    .  2022   1  ILLNESS
0002  1945 2011   .     .
0002    .  2011   12 ACCIDENT
0003  1938 1990   .     .
0003    .  1990   10 ILLNESS
run;

DB results from DATASET1 and DATASET2 that have been "merged" with set function. 

data DATASET1; 
input ID :$20. DATE_OF_BIRTH  DATE_OF_DEATH ;
cards;
0001  1998 2022  
0002  1945 2011 
0003  1938 1990 
run;

data DATASET2; 
input ID :$20. DATE_OF_DEATH MONTH REASON :$20.;
cards;
0001  2022  ILLNESS
0002  2011 ACCIDENT
0003  1990  ILLNESS
run;

Is there a way to get the following?


data DB1;
  input ID :$20. DATE_OF_BIRTH  DATE_OF_DEATH  Month Reason :$20.;
cards;
0001  1998 2022   1  ILLNESS
0001    .  2022   .     .
0002  1945 2011   12 ACCIDENT
0002    .  2011   .     .
0003  1938 1990   10 ILLNESS
0003    .  1990   .     .
run;

In other words I would like to move data of variables Month and Reason at the first row of the ID as shown in DB1. I tried using "first." but without success.

Really thank you in advance

Tom · Posted 07-15-2024 02:57 PM

Why not just MERGE to begin with? I am assuming each of the two datasets have only one observation per ID.

data DATASET1; 
  input ID :$20. DATE_OF_BIRTH  DATE_OF_DEATH ;
cards;
0001  1998 2022  
0002  1945 2011 
0003  1938 1990 
;

data DATASET2; 
  input ID :$20. DATE_OF_DEATH MONTH REASON :$20.;
cards;
0001  2022 1  ILLNESS
0002  2011 12 ACCIDENT
0003  1990 10 ILLNESS
;

data want;
  merge dataset1 dataset2;
  by id ;
run;

Result:

               DATE_OF_    DATE_OF_
Obs     ID       BIRTH       DEATH     MONTH     REASON

 1     0001      1998        2022         1     ILLNESS
 2     0002      1945        2011        12     ACCIDENT
 3     0003      1938        1990        10     ILLNESS

If you really do not have access to the original datasets so that all you have is:

data bad;
  set dataset1 dataset2;
  by id;
run;

Then to convert that dataset you can use the UPDATE statement. Treating an empty version of the dataset as the starting point and the whole dataset as the transactions to be applied.

data want;
  update bad(obs=0) bad;
  by id;
run;

Which generates the same result as we got when merging.

View solution in original post

Quentin · Posted 07-15-2024 12:30 PM

Sound like a look-ahead problem, for which I would normally look for one of Mark Keintz's papers on lags and leads, e.g.: https://support.sas.com/resources/papers/proceedings17/1277-2017.pdf.

Do you know you will always have exactly two rows per ID?

Do you know Date_Of_Birth will always be missing for the row that has the value for Date_Of_Death?

Is there a reason you want to keep two records per ID in the output dataset?

How did you end up with this weird data? It looks like someone tried to merge data with the date of birth and date with the date of death, but accidentally concatenated them. So it might be better to fix the code that created this data, if that's an option.

NewUsrStat · Posted 07-15-2024 12:36 PM

Thank you very much for reply. The dataset is too big and I don't have the chance to check to answer to your questions. What I know is that two datasets have been merged using "set". Both have in common variables "ID" and "DATE_of_DEATH". I will edit the post showing the single datasets that have been "merged".

Quentin · Posted 07-15-2024 01:10 PM

Do you have dataset1 and dataset2? The DATA step has a MERGE statement. It looks like instead of using the SET statement to combine them (which 'stacks' them vertically), you should use the MERGE statement (which merges them, horizontally). With MERGE your output dataset would have three rows, and more variables.

Tom · Posted 07-15-2024 02:57 PM

Why not just MERGE to begin with? I am assuming each of the two datasets have only one observation per ID.

data DATASET1; 
  input ID :$20. DATE_OF_BIRTH  DATE_OF_DEATH ;
cards;
0001  1998 2022  
0002  1945 2011 
0003  1938 1990 
;

data DATASET2; 
  input ID :$20. DATE_OF_DEATH MONTH REASON :$20.;
cards;
0001  2022 1  ILLNESS
0002  2011 12 ACCIDENT
0003  1990 10 ILLNESS
;

data want;
  merge dataset1 dataset2;
  by id ;
run;

Result:

               DATE_OF_    DATE_OF_
Obs     ID       BIRTH       DEATH     MONTH     REASON

 1     0001      1998        2022         1     ILLNESS
 2     0002      1945        2011        12     ACCIDENT
 3     0003      1938        1990        10     ILLNESS

If you really do not have access to the original datasets so that all you have is:

data bad;
  set dataset1 dataset2;
  by id;
run;

Then to convert that dataset you can use the UPDATE statement. Treating an empty version of the dataset as the starting point and the whole dataset as the transactions to be applied.

data want;
  update bad(obs=0) bad;
  by id;
run;

Which generates the same result as we got when merging.

NewUsrStat · Posted 07-16-2024 12:17 PM

Hi Tom, unfortunately I have no access to single datasets, but your help unblocked the situation. Thank you very much!

Move data to the desired position in the dataset

Re: Move data to the desired position in the dataset

Re: Move data to the desired position in the dataset

Re: Move data to the desired position in the dataset

Re: Move data to the desired position in the dataset

Re: Move data to the desired position in the dataset

Re: Move data to the desired position in the dataset

The 2025 SAS Hackathon has begun!