SAS Procedures

Thomas_mp · Posted 11-12-2023 11:06 AM

Hello,

I have hundreds of subjects (variables) that have observations for some dates but not for other dates. I need to have the variables with observations for all the dates, with missing values for the dates with missing information.

The best a way to explain this is with one example. Below you will see cards for the data set "have" . The first column has all the dates that need to appear in the final desired outcome. The second column only has the dates with observations for variable C1; The next column (C1) has the values for this variable. The other columns have similar information for the variables C2 and C3.

Here is the information:

data have;
input date_all :mmddyy10. date_c1 :mmddyy10. C1 date_c2 :mmddyy10. C2 date_c3 :mmddyy10. C3 ;
format date_all mmddyy10. date_c1 mmddyy10. date_c2 mmddyy10. date_c3 mmddyy10. ;
cards;

1/3/2000 1/3/2000 0.5 1/8/2000 0.04 1/5/2000 .
1/4/2000 1/4/2000 0.6 1/9/2000 0.07 1/6/2000 .
1/5/2000 1/5/2000 0.7 . . 1/7/2000 .
1/6/2000 1/6/2000 . . . 1/8/2000 .
1/7/2000 1/7/2000 . . . 1/9/2000 .
1/8/2000 1/8/2000 . . . 1/10/2000 0.28
1/9/2000 1/9/2000 . . . 1/11/2000 0.15
1/10/2000 1/10/2000 . . . 1/12/2000 .
1/11/2000 1/11/2000 . . . 1/13/2000 .
1/12/2000 1/12/2000 . . . 1/14/2000 0.28
1/13/2000 1/13/2000 . . . . .
1/14/2000 1/14/2000 . . . . .

;run;

proc print; run ;

As you can see, I only have missing observations for the variables C1,C2 and C3 in some dates. I need observations for the 3 variables for all the dates, with missing values where the observation is missing to conduct the analysis.

Please, in the attached message you can see the final desired outcome (and the initial observations in this message) .

I would appreciate your help. with this.

Tomas

Tom · Posted 11-13-2023 09:01 AM

With a little bit of "wallpaper code" you could go from your original to the rational layout in one step (assuming the non missing dates in each column are sorted).

data want;
  merge have(keep=subjid date_all rename=(date_all=date))
        have(keep=subjid date_c1 c1 rename=(date_c1=date) where=(not missing(date)))
        have(keep=subjid date_c2 c2 rename=(date_c2=date) where=(not missing(date)))
        have(keep=subjid date_c3 c3 rename=(date_c3=date) where=(not missing(date)))
  ;
  by subjid date;
run;

View solution in original post

PaigeMiller · Posted 11-12-2023 03:27 PM

Most of us refuse to download Microsoft Office files as they can be a security threat. The proper way to show us the desired output is either as a screen capture of your Excel file, or as SAS data step code, similar to what you provided for the input data. If you are going to show us a screen capture, use the "Insert Photos" icon; do not attach files.

--
Paige Miller

Patrick · Posted 11-12-2023 05:45 PM

I believe below returns what you're after.

data have;
  input date_all :ddmmyy10. date_c1 :ddmmyy10. C1 date_c2 :ddmmyy10. C2 date_c3 :ddmmyy10. C3;
  format date_all ddmmyy10. date_c1 ddmmyy10. date_c2 ddmmyy10. date_c3 ddmmyy10.;
  subj_id=1;
  cards;
1/01/2000 1/01/2000 0.2 6/01/2000 0.02 3/01/2000 0.25
2/01/2000 2/01/2000 0.3 7/01/2000 0.06 4/01/2000 0.26
3/01/2000 3/01/2000 0.5 8/01/2000 0.04 5/01/2000 .
4/01/2000 4/01/2000 0.6 9/01/2000 0.07 6/01/2000 .
5/01/2000 5/01/2000 0.7 . . 7/01/2000 .
6/01/2000 6/01/2000 . . . 8/01/2000 .
7/01/2000 7/01/2000 . . . 9/01/2000 .
8/01/2000 8/01/2000 . . . 10/01/2000 0.28
9/01/2000 9/01/2000 . . . 11/01/2000 0.15
10/01/2000 10/01/2000 . . . 12/01/2000 .
11/01/2000 11/01/2000 . . . 13/01/2000 .
12/01/2000 12/01/2000 . . . 14/01/2000 0.28
13/01/2000 13/01/2000 . . . . .
14/01/2000 14/01/2000 . . . . .
;

data desired;
  input date_all :ddmmyy10. date_c1 :ddmmyy10. C1 date_c2 :ddmmyy10. C2 date_c3 :ddmmyy10. C3;
  format date_all ddmmyy10. date_c1 ddmmyy10. date_c2 ddmmyy10. date_c3 ddmmyy10.;
  subj_id=1;
  cards;
1/01/2000 1/01/2000 0.2 1/01/2000 . 1/01/2000 .
2/01/2000 2/01/2000 0.3 2/01/2000 . 2/01/2000 .
3/01/2000 3/01/2000 0.5 3/01/2000 . 3/01/2000 0.25
4/01/2000 4/01/2000 0.6 4/01/2000 . 4/01/2000 0.26
5/01/2000 5/01/2000 0.7 5/01/2000 . 5/01/2000 .
6/01/2000 6/01/2000 . 6/01/2000 0.02 6/01/2000 .
7/01/2000 7/01/2000 . 7/01/2000 0.06 7/01/2000 .
8/01/2000 8/01/2000 . 8/01/2000 0.04 8/01/2000 .
9/01/2000 9/01/2000 . 9/01/2000 0.07 9/01/2000 .
10/01/2000 10/01/2000 . 10/01/2000 . 10/01/2000 0.28
11/01/2000 11/01/2000 . 11/01/2000 . 11/01/2000 0.15
12/01/2000 12/01/2000 . 12/01/2000 . 12/01/2000 .
13/01/2000 13/01/2000 . 13/01/2000 . 13/01/2000 .
14/01/2000 14/01/2000 . 14/01/2000 . 14/01/2000 0.28
;

data have_long;
  set have;
  array vals{2,3} date_c1-date_c3 c1-c3;
  do index=1 to 3;
    date_c=vals[1,index];
    c=vals[2,index];
    if not missing(date_c) then output;
  end;
  format date_c ddmmyy10.;
  keep subj_id index date_c c;
run;

proc sort data=have_long;
  by subj_id date_c index;
run;

data want;
  if 0 then set have(keep=subj_id date_c1-date_c3 c1-c3);
  set have_long;
  by subj_id date_c index;
  array vals{2,3} date_c1-date_c3 c1-c3;
  vals[1,index]=date_c;
  vals[2,index]=c;
  if last.date_c then 
    do;
      do i=1 to 3;
        if missing(vals[1,i]) then vals[1,i]=coalesce(vals[1,1],vals[1,2],vals[1,3]);
      end;
      output;
    end;
  keep subj_id date_c1-date_c3 c1-c3;
run;

Here a SQL version for the same:

proc sql;
  create table want2 as
  select 
    subj_id,
    coalesce(date_c1,date_c2,date_c3) as date_c1 format=ddmmyy10.,
    c1,
    coalesce(date_c1,date_c2,date_c3) as date_c2 format=ddmmyy10.,
    c2,
    coalesce(date_c1,date_c2,date_c3) as date_c3 format=ddmmyy10.,
    c3
  from 
    have(keep=subj_id date_c1 c1 where=(not missing(date_c1)))
    full join 
    have(keep=date_c2 c2 where=(not missing(date_c2)))
    on date_c1=date_c2
    full join
    have(keep=date_c3 c3 where=(not missing(date_c3)))
    on date_c3=coalesce(date_c1,date_c2)
  order by
    subj_id, date_c1
  ;
quit;

Tom · Posted 11-12-2023 08:31 PM

I don't understand where you expect to find data if it was not collected?

So looking at the data you posted it looks like you have this data:

Obs    subjid          date     c1     c2      c3

  1       1      2000-01-01    0.2     .       .
  2       1      2000-01-02    0.3     .       .
  3       1      2000-01-03    0.5     .      0.25
  4       1      2000-01-04    0.6     .      0.26
  5       1      2000-01-05    0.7     .       .
  6       1      2000-01-06     .     0.02     .
  7       1      2000-01-07     .     0.06     .
  8       1      2000-01-08     .     0.04     .
  9       1      2000-01-09     .     0.07     .
 10       1      2000-01-10     .      .      0.28
 11       1      2000-01-11     .      .      0.15
 12       1      2000-01-12     .      .       .
 13       1      2000-01-13     .      .       .
 14       1      2000-01-14     .      .      0.28

So which of those missing values would you like to replace?

What values do you want to replace them with?

Do you want to do some type of last observation carried forward?

Or did you have some other type of method to provide values for the missing data points?

Thomas_mp · Posted 11-13-2023 08:28 AM

Hello Tom,

Thank you for trying to help.

You write : "So looking at the data you posted it looks like you have this data"

However, this is not what I have, but what I would need to have.

Attached here is what I have. You can get this running the sort program attached to the post with my original question.

Thank you again

Best,

Tomas

Tom · Posted 11-13-2023 08:52 AM

I doubt you have data in in a WORD document. Please share data as SAS data step.

So you don't want to make-up data? You just want to fix the structure so it is rational, like in the version I posted?

All I did was convert the multiple columns of DATE/VALUE pairs into multiple observations of DATE/VARNAME/VALUE triplets and use PROC TRANSPOSE to convert that into observations with DATE and variables C1 to C3. To fill in the missing dates I merged that with you first date column which you said had all of the dates.

data all_dates(keep=subjid date c1-c3) have(drop=c1-c3);
  length subjid 8 date c1-c3 8 varname $32 value 8;
  subjid=1;
  informat date ddmmyy.;
  format date yymmdd10.;
  input date @;
  output all_dates;
  do varname='C1','C2','C3';
    input date value @;
    if not missing(value) then output have;
  end;
cards;
1/01/2000 1/01/2000 0.2 6/01/2000 0.02 3/01/2000 0.25
2/01/2000 2/01/2000 0.3 7/01/2000 0.06 4/01/2000 0.26
3/01/2000 3/01/2000 0.5 8/01/2000 0.04 5/01/2000 .
4/01/2000 4/01/2000 0.6 9/01/2000 0.07 6/01/2000 .
5/01/2000 5/01/2000 0.7 . . 7/01/2000 .
6/01/2000 6/01/2000 . . . 8/01/2000 .
7/01/2000 7/01/2000 . . . 9/01/2000 .
8/01/2000 8/01/2000 . . . 10/01/2000 0.28
9/01/2000 9/01/2000 . . . 11/01/2000 0.15
10/01/2000 10/01/2000 . . . 12/01/2000 .
11/01/2000 11/01/2000 . . . 13/01/2000 .
12/01/2000 12/01/2000 . . . 14/01/2000 0.28
13/01/2000 13/01/2000 . . . . .
14/01/2000 14/01/2000 . . . . .
;

proc sort data=have;
  by subjid date varname;
run;

proc transpose data=have out=normal(drop=_name_);
  by subjid date;
  id varname;
  var value;
run;

data want;
  merge all_dates normal;
  by subjid date;
run;

proc print;
run;

If instead of a source text file you already have a SAS dataset like:

data have;
  subjid=1;
  input date_all :mmddyy. (date_c1 C1 date_c2 C2 date_c3 C3) (:mmddyy10. :32.);
  format date_all date_c1-date_c3 yymmdd10.;
cards;
1/3/2000 1/3/2000 0.5 1/8/2000 0.04 1/5/2000 .
1/4/2000 1/4/2000 0.6 1/9/2000 0.07 1/6/2000 .
1/5/2000 1/5/2000 0.7 . . 1/7/2000 .
1/6/2000 1/6/2000 . . . 1/8/2000 .
1/7/2000 1/7/2000 . . . 1/9/2000 .
1/8/2000 1/8/2000 . . . 1/10/2000 0.28
1/9/2000 1/9/2000 . . . 1/11/2000 0.15
1/10/2000 1/10/2000 . . . 1/12/2000 .
1/11/2000 1/11/2000 . . . 1/13/2000 .
1/12/2000 1/12/2000 . . . 1/14/2000 0.28
1/13/2000 1/13/2000 . . . . .
1/14/2000 1/14/2000 . . . . .
;

Then you can use two ARRAYs to convert that wide structure into a tall structure:

data all_dates(keep=subjid date_all c1-c3 rename=(date_all=date))
     tall(keep=subjid date varname value)
;
  set have;
  array d date_c1-date_c3;
  array v c1-c3;
  length date 8 varname $32 value 8;
  format date yymmdd10.;
  do index=1 to dim(d);
     date=d[index];
     varname=vname(v[index]);
     value=v[index];
     if not missing(date) then output tall;
  end;
  call missing(of c1-c3);
  output all_dates;
run;

proc sort data=tall;
  by subjid date varname;
run;

proc transpose data=tall out=normal(drop=_name_);
  by subjid date;
  id varname;
  var value;
run;

data want;
  merge all_dates normal;
  by subjid date;
run;

proc print;
run;

Result

Obs    subjid          date     C1     C2      C3

  1       1      2000-01-03    0.5     .       .
  2       1      2000-01-04    0.6     .       .
  3       1      2000-01-05    0.7     .       .
  4       1      2000-01-06     .      .       .
  5       1      2000-01-07     .      .       .
  6       1      2000-01-08     .     0.04     .
  7       1      2000-01-09     .     0.07     .
  8       1      2000-01-10     .      .      0.28
  9       1      2000-01-11     .      .      0.15
 10       1      2000-01-12     .      .       .
 11       1      2000-01-13     .      .       .
 12       1      2000-01-14     .      .      0.28

Tom · Posted 11-13-2023 09:01 AM

With a little bit of "wallpaper code" you could go from your original to the rational layout in one step (assuming the non missing dates in each column are sorted).

data want;
  merge have(keep=subjid date_all rename=(date_all=date))
        have(keep=subjid date_c1 c1 rename=(date_c1=date) where=(not missing(date)))
        have(keep=subjid date_c2 c2 rename=(date_c2=date) where=(not missing(date)))
        have(keep=subjid date_c3 c3 rename=(date_c3=date) where=(not missing(date)))
  ;
  by subjid date;
run;

Thomas_mp · Posted 11-14-2023 05:30 AM

Thank you Paul, This works.

One more question taht you very likely know how to answer.

Your input code to read the SAS dataset is:

input date_all :mmddyy. (date_c1 C1 date_c2 C2 date_c3 C3 date_c4 C4 ) (:mmddyy10. :32.);

You had 3 variables (C1,C2, C3), I just added one more to see if I could run your code with one more variable, and I did. But I need to add 470 more variables C1., C2,......C470 and the corresponding dates date_c1 ... date_c470

Do you know how to write a simple input statement to read the 470 variables and dates ?

Thank you again.

Tomas

Patrick · Posted 11-13-2023 08:52 PM

@Thomas_mp What you share as a screenshot as your HAVE data differs from what you shared in your Excel as HAVE data.

The two SAS data steps I've posted earlier for HAVE and Desired create the data you shared in your Excel. Are you now telling us this is not the right data? If so then please share sample data via working SAS data step code with datalines statement so that we've all got the same and a consistent set to work with.

SAS Procedures

Merging several observations with different dates

;run;

Re: Merging several observations with different dates

Re: Merging several observations with different dates

Re: Merging several observations with different dates

Re: Merging several observations with different dates

Re: Merging several observations with different dates

Re: Merging several observations with different dates

Re: Merging several observations with different dates

Re: Merging several observations with different dates

Re: Merging several observations with different dates

Follow Us

What is...

SAS Procedures

;run;

Special offer for SAS Communities members

SAS Training: Just a Click Away

Follow Us

What is...