Solved: Make a variable consistent over the entire dataset

NewUsrStat · Posted 10-11-2024 12:15 PM

Hi guys,

suppose to have the following dataset:

data DB;
  input ID :$20. Admission :date09. Discharge :date09. Sex;
  format Admission date9. Discharge date9.;
cards;
0001  13JAN2017 25JAN2017  M
0001  22FEB2018 03MAR2018  
0001  22FEB2018 03MAR2018 
0002  01DEC2016 14DEC2016  F
0002  01DEC2016 14DEC2016  
0002  25DEC2017 02JAN2018 
0002  06JAN2018 09JAN2018  
;run;

is there a way to make consistent the variable sex, in other words to get the following dataset?

data DB1;
  input ID :$20. Admission :date09. Discharge :date09. Sex;
  format Admission date9. Discharge date9.;
cards;
0001  13JAN2017 25JAN2017  M
0001  22FEB2018 03MAR2018  M 
0001  22FEB2018 03MAR2018  M
0002  01DEC2016 14DEC2016  F
0002  01DEC2016 14DEC2016  F
0002  25DEC2017 02JAN2018  F
0002  06JAN2018 09JAN2018  F
;run;

Note that replicated dates are there for a reason.

Thank you in advance!

antonbcristina · Posted 10-11-2024 12:31 PM

Hi @NewUsrStat, assuming Sex is always non-missing for the first instance of ID, you'll want to create a new variable (New_Sex) that retains the Sex value for all observations for an ID.

data db1;
	retain New_Sex;
	set db;
	by id;

	if first.id then New_Sex=Sex;
run;

View solution in original post

antonbcristina · Posted 10-11-2024 12:31 PM

Hi @NewUsrStat, assuming Sex is always non-missing for the first instance of ID, you'll want to create a new variable (New_Sex) that retains the Sex value for all observations for an ID.

data db1;
	retain New_Sex;
	set db;
	by id;

	if first.id then New_Sex=Sex;
run;

ballardw · Posted 10-11-2024 12:34 PM

Your first data set does not create a usable data set as 1) Sex is read as numeric, hence all missing and 2) read past the end of the line because of the missing values in the SEX column.

The $ after Sex on the input statement fixes the first problem, TRUNCOVER on the INFILE the other by stopping reading when running out of values.

data work.DB;
  infile datalines truncover;
  input ID :$20. Admission :date09. Discharge :date09. Sex $;
  format Admission date9. Discharge date9.;
cards;
0001  13JAN2017 25JAN2017  M
0001  22FEB2018 03MAR2018  
0001  22FEB2018 03MAR2018 
0002  01DEC2016 14DEC2016  F
0002  01DEC2016 14DEC2016  
0002  25DEC2017 02JAN2018 
0002  06JAN2018 09JAN2018  
;run;

One way:

Proc sql;
   create table work.db1 as
   select a.id, a.admission, a.discharge
          ,b.sex
   from   work.db as a
          left join
          (select distinct id,sex from work.db
           where not missing(sex) ) as b
           on a.id=b.id
   ;
quit;

If you data is not in sorted order on input this might change that.

antonbcristina · Posted 10-11-2024 12:35 PM

@NewUsrStat, I should also mention that running your DATA step to create DB didn't work. By default, when you reach the end of the line and there are missing values for the remainder of the variables, SAS will continue reading from the next line. Hence the message below: "SAS went to a new line when INPUT statement reached past the end of the line".

You'll want to add in the option TRUNCOVER like so:

data DB;
  infile cards truncover;
  input ID :$20. Admission :date09. Discharge :date09. Sex $;
  format Admission date9. Discharge date9.;
cards;
0001  13JAN2017 25JAN2017  M
0001  22FEB2018 03MAR2018  
0001  22FEB2018 03MAR2018 
0002  01DEC2016 14DEC2016  F
0002  01DEC2016 14DEC2016  
0002  25DEC2017 02JAN2018 
0002  06JAN2018 09JAN2018  
;
run;

Tom · Posted 10-11-2024 02:02 PM

Not sure what you want to do when there are two different values.

But a simple merge of dataset with itself should work fine.

data DB;
  input ID :$20. Admission :date. Discharge :date. Sex $;
  format Admission Discharge date9.;
cards;
0001  13JAN2017 25JAN2017  M
0001  22FEB2018 03MAR2018  .
0001  22FEB2018 03MAR2018  .
0002  01DEC2016 14DEC2016  F
0002  01DEC2016 14DEC2016  .
0002  25DEC2017 02JAN2018  .
0002  06JAN2018 09JAN2018  .
;

data want ;
  merge db(where=(not missing(sex))) db(drop=sex);
  by id;
run;

Result

Obs     ID     Admission    Discharge    Sex

 1     0001    13JAN2017    25JAN2017     M
 2     0001    22FEB2018    03MAR2018     M
 3     0001    22FEB2018    03MAR2018     M
 4     0002    01DEC2016    14DEC2016     F
 5     0002    01DEC2016    14DEC2016     F
 6     0002    25DEC2017    02JAN2018     F
 7     0002    06JAN2018    09JAN2018     F

Ksharp · Posted 10-12-2024 03:07 AM

data DB;
  input ID :$20. Admission :date. Discharge :date. Sex $;
  format Admission Discharge date9.;
cards;
0001  13JAN2017 25JAN2017  M
0001  22FEB2018 03MAR2018  .
0001  22FEB2018 03MAR2018  .
0002  01DEC2016 14DEC2016  F
0002  01DEC2016 14DEC2016  .
0002  25DEC2017 02JAN2018  .
0002  06JAN2018 09JAN2018  .
;

data want ;
  update db(obs=0) db;
  by id;
  output;
run;

Make a variable consistent over the entire dataset

Re: Make a variable consistent over the entire dataset

Re: Make a variable consistent over the entire dataset

Re: Make a variable consistent over the entire dataset

Re: Make a variable consistent over the entire dataset

Re: Make a variable consistent over the entire dataset

Re: Make a variable consistent over the entire dataset

Make a variable consistent over the entire dataset

Re: Make a variable consistent over the entire dataset

Re: Make a variable consistent over the entire dataset

Re: Make a variable consistent over the entire dataset

Re: Make a variable consistent over the entire dataset

Re: Make a variable consistent over the entire dataset

Re: Make a variable consistent over the entire dataset

Registration is open