BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Thomas_mp
Obsidian | Level 7

Hello, 

I have hundreds of subjects (variables) that have observations for some dates but not for other dates.  I need to have  the variables with observations for all the dates, with missing values for the dates with missing information. 

The best a way to explain this is with one example.  Below you will see cards for the data set "have" . The first column  has all the dates that need to appear in the final desired outcome.  The second column only has the dates with observations for variable C1; The next column (C1) has the values for this variable.  The other columns have similar information for the variables C2 and C3. 

Here is the information:

 

data have;
input date_all :mmddyy10. date_c1 :mmddyy10. C1 date_c2 :mmddyy10. C2 date_c3 :mmddyy10. C3 ;
format date_all mmddyy10. date_c1 mmddyy10. date_c2 mmddyy10. date_c3 mmddyy10. ;
cards;

1/3/2000 1/3/2000 0.5 1/8/2000 0.04 1/5/2000 .
1/4/2000 1/4/2000 0.6 1/9/2000 0.07 1/6/2000 .
1/5/2000 1/5/2000 0.7 . . 1/7/2000 .
1/6/2000 1/6/2000 . . . 1/8/2000 .
1/7/2000 1/7/2000 . . . 1/9/2000 .
1/8/2000 1/8/2000 . . . 1/10/2000 0.28
1/9/2000 1/9/2000 . . . 1/11/2000 0.15
1/10/2000 1/10/2000 . . . 1/12/2000 .
1/11/2000 1/11/2000 . . . 1/13/2000 .
1/12/2000 1/12/2000 . . . 1/14/2000 0.28
1/13/2000 1/13/2000 . . . . .
1/14/2000 1/14/2000 . . . . .
;run;

proc print; run ;

 

As you can see, I only have missing observations for the variables C1,C2 and C3 in some dates. I need observations for the 3 variables  for all the dates, with missing values where the observation is missing to conduct the analysis.

Please, in the attached message you can see the final desired outcome (and the initial observations in this message) .

I would appreciate your help. with this.

Tomas

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

With a little bit of "wallpaper code" you could go from your original to the rational layout in one step (assuming the non missing dates in each column are sorted).

data want;
  merge have(keep=subjid date_all rename=(date_all=date))
        have(keep=subjid date_c1 c1 rename=(date_c1=date) where=(not missing(date)))
        have(keep=subjid date_c2 c2 rename=(date_c2=date) where=(not missing(date)))
        have(keep=subjid date_c3 c3 rename=(date_c3=date) where=(not missing(date)))
  ;
  by subjid date;
run;

View solution in original post

8 REPLIES 8
PaigeMiller
Diamond | Level 26

Most of us refuse to download Microsoft Office files as they can be a security threat. The proper way to show us the desired output is either as a screen capture of your Excel file, or as SAS data step code, similar to what you provided for the input data. If you are going to show us a screen capture, use the "Insert Photos" icon; do not attach files.

--
Paige Miller
Patrick
Opal | Level 21

I believe below returns what you're after.

data have;
  input date_all :ddmmyy10. date_c1 :ddmmyy10. C1 date_c2 :ddmmyy10. C2 date_c3 :ddmmyy10. C3;
  format date_all ddmmyy10. date_c1 ddmmyy10. date_c2 ddmmyy10. date_c3 ddmmyy10.;
  subj_id=1;
  cards;
1/01/2000 1/01/2000 0.2 6/01/2000 0.02 3/01/2000 0.25
2/01/2000 2/01/2000 0.3 7/01/2000 0.06 4/01/2000 0.26
3/01/2000 3/01/2000 0.5 8/01/2000 0.04 5/01/2000 .
4/01/2000 4/01/2000 0.6 9/01/2000 0.07 6/01/2000 .
5/01/2000 5/01/2000 0.7 . . 7/01/2000 .
6/01/2000 6/01/2000 . . . 8/01/2000 .
7/01/2000 7/01/2000 . . . 9/01/2000 .
8/01/2000 8/01/2000 . . . 10/01/2000 0.28
9/01/2000 9/01/2000 . . . 11/01/2000 0.15
10/01/2000 10/01/2000 . . . 12/01/2000 .
11/01/2000 11/01/2000 . . . 13/01/2000 .
12/01/2000 12/01/2000 . . . 14/01/2000 0.28
13/01/2000 13/01/2000 . . . . .
14/01/2000 14/01/2000 . . . . .
;

data desired;
  input date_all :ddmmyy10. date_c1 :ddmmyy10. C1 date_c2 :ddmmyy10. C2 date_c3 :ddmmyy10. C3;
  format date_all ddmmyy10. date_c1 ddmmyy10. date_c2 ddmmyy10. date_c3 ddmmyy10.;
  subj_id=1;
  cards;
1/01/2000 1/01/2000 0.2 1/01/2000 . 1/01/2000 .
2/01/2000 2/01/2000 0.3 2/01/2000 . 2/01/2000 .
3/01/2000 3/01/2000 0.5 3/01/2000 . 3/01/2000 0.25
4/01/2000 4/01/2000 0.6 4/01/2000 . 4/01/2000 0.26
5/01/2000 5/01/2000 0.7 5/01/2000 . 5/01/2000 .
6/01/2000 6/01/2000 . 6/01/2000 0.02 6/01/2000 .
7/01/2000 7/01/2000 . 7/01/2000 0.06 7/01/2000 .
8/01/2000 8/01/2000 . 8/01/2000 0.04 8/01/2000 .
9/01/2000 9/01/2000 . 9/01/2000 0.07 9/01/2000 .
10/01/2000 10/01/2000 . 10/01/2000 . 10/01/2000 0.28
11/01/2000 11/01/2000 . 11/01/2000 . 11/01/2000 0.15
12/01/2000 12/01/2000 . 12/01/2000 . 12/01/2000 .
13/01/2000 13/01/2000 . 13/01/2000 . 13/01/2000 .
14/01/2000 14/01/2000 . 14/01/2000 . 14/01/2000 0.28
;

data have_long;
  set have;
  array vals{2,3} date_c1-date_c3 c1-c3;
  do index=1 to 3;
    date_c=vals[1,index];
    c=vals[2,index];
    if not missing(date_c) then output;
  end;
  format date_c ddmmyy10.;
  keep subj_id index date_c c;
run;

proc sort data=have_long;
  by subj_id date_c index;
run;

data want;
  if 0 then set have(keep=subj_id date_c1-date_c3 c1-c3);
  set have_long;
  by subj_id date_c index;
  array vals{2,3} date_c1-date_c3 c1-c3;
  vals[1,index]=date_c;
  vals[2,index]=c;
  if last.date_c then 
    do;
      do i=1 to 3;
        if missing(vals[1,i]) then vals[1,i]=coalesce(vals[1,1],vals[1,2],vals[1,3]);
      end;
      output;
    end;
  keep subj_id date_c1-date_c3 c1-c3;
run;

 Here a SQL version for the same:

proc sql;
  create table want2 as
  select 
    subj_id,
    coalesce(date_c1,date_c2,date_c3) as date_c1 format=ddmmyy10.,
    c1,
    coalesce(date_c1,date_c2,date_c3) as date_c2 format=ddmmyy10.,
    c2,
    coalesce(date_c1,date_c2,date_c3) as date_c3 format=ddmmyy10.,
    c3
  from 
    have(keep=subj_id date_c1 c1 where=(not missing(date_c1)))
    full join 
    have(keep=date_c2 c2 where=(not missing(date_c2)))
    on date_c1=date_c2
    full join
    have(keep=date_c3 c3 where=(not missing(date_c3)))
    on date_c3=coalesce(date_c1,date_c2)
  order by
    subj_id, date_c1
  ;
quit;

 

Tom
Super User Tom
Super User

I don't understand where you expect to find data if it was not collected?

 

So looking at the data you posted it looks like you have this data:

Obs    subjid          date     c1     c2      c3

  1       1      2000-01-01    0.2     .       .
  2       1      2000-01-02    0.3     .       .
  3       1      2000-01-03    0.5     .      0.25
  4       1      2000-01-04    0.6     .      0.26
  5       1      2000-01-05    0.7     .       .
  6       1      2000-01-06     .     0.02     .
  7       1      2000-01-07     .     0.06     .
  8       1      2000-01-08     .     0.04     .
  9       1      2000-01-09     .     0.07     .
 10       1      2000-01-10     .      .      0.28
 11       1      2000-01-11     .      .      0.15
 12       1      2000-01-12     .      .       .
 13       1      2000-01-13     .      .       .
 14       1      2000-01-14     .      .      0.28

So which of those missing values would you like to replace?

What values do you want to replace them with?

 

Do you want to do some type of last observation carried forward?

Or did you have some other type of method to provide values for the missing data points?

 

Thomas_mp
Obsidian | Level 7

Hello Tom,

Thank you for trying to help.

You write : "So looking at the data you posted it looks like you have this data"

However, this is not what I have, but what I would need to have.

Attached here is what I have.  You can get this running the sort program attached to the post with my original question.

Thank you again

Best,

Tomas

Tom
Super User Tom
Super User

I doubt you have data in in a WORD document.  Please share data as SAS data step.

 

So you don't want to make-up data?  You just want to fix the structure so it is rational, like in the version I posted?

 

All I did was convert the multiple columns of DATE/VALUE pairs into multiple observations of DATE/VARNAME/VALUE triplets and use PROC TRANSPOSE to convert that into observations with DATE and variables C1 to C3.  To fill in the missing dates I merged that with you first date column which you said had all of the dates.

data all_dates(keep=subjid date c1-c3) have(drop=c1-c3);
  length subjid 8 date c1-c3 8 varname $32 value 8;
  subjid=1;
  informat date ddmmyy.;
  format date yymmdd10.;
  input date @;
  output all_dates;
  do varname='C1','C2','C3';
    input date value @;
    if not missing(value) then output have;
  end;
cards;
1/01/2000 1/01/2000 0.2 6/01/2000 0.02 3/01/2000 0.25
2/01/2000 2/01/2000 0.3 7/01/2000 0.06 4/01/2000 0.26
3/01/2000 3/01/2000 0.5 8/01/2000 0.04 5/01/2000 .
4/01/2000 4/01/2000 0.6 9/01/2000 0.07 6/01/2000 .
5/01/2000 5/01/2000 0.7 . . 7/01/2000 .
6/01/2000 6/01/2000 . . . 8/01/2000 .
7/01/2000 7/01/2000 . . . 9/01/2000 .
8/01/2000 8/01/2000 . . . 10/01/2000 0.28
9/01/2000 9/01/2000 . . . 11/01/2000 0.15
10/01/2000 10/01/2000 . . . 12/01/2000 .
11/01/2000 11/01/2000 . . . 13/01/2000 .
12/01/2000 12/01/2000 . . . 14/01/2000 0.28
13/01/2000 13/01/2000 . . . . .
14/01/2000 14/01/2000 . . . . .
;

proc sort data=have;
  by subjid date varname;
run;

proc transpose data=have out=normal(drop=_name_);
  by subjid date;
  id varname;
  var value;
run;

data want;
  merge all_dates normal;
  by subjid date;
run;

proc print;
run;

If instead of a source text file you already have a SAS dataset like:

data have;
  subjid=1;
  input date_all :mmddyy. (date_c1 C1 date_c2 C2 date_c3 C3) (:mmddyy10. :32.);
  format date_all date_c1-date_c3 yymmdd10.;
cards;
1/3/2000 1/3/2000 0.5 1/8/2000 0.04 1/5/2000 .
1/4/2000 1/4/2000 0.6 1/9/2000 0.07 1/6/2000 .
1/5/2000 1/5/2000 0.7 . . 1/7/2000 .
1/6/2000 1/6/2000 . . . 1/8/2000 .
1/7/2000 1/7/2000 . . . 1/9/2000 .
1/8/2000 1/8/2000 . . . 1/10/2000 0.28
1/9/2000 1/9/2000 . . . 1/11/2000 0.15
1/10/2000 1/10/2000 . . . 1/12/2000 .
1/11/2000 1/11/2000 . . . 1/13/2000 .
1/12/2000 1/12/2000 . . . 1/14/2000 0.28
1/13/2000 1/13/2000 . . . . .
1/14/2000 1/14/2000 . . . . .
;

Then you can use two ARRAYs to convert that wide structure into a tall structure:

data all_dates(keep=subjid date_all c1-c3 rename=(date_all=date))
     tall(keep=subjid date varname value)
;
  set have;
  array d date_c1-date_c3;
  array v c1-c3;
  length date 8 varname $32 value 8;
  format date yymmdd10.;
  do index=1 to dim(d);
     date=d[index];
     varname=vname(v[index]);
     value=v[index];
     if not missing(date) then output tall;
  end;
  call missing(of c1-c3);
  output all_dates;
run;

proc sort data=tall;
  by subjid date varname;
run;

proc transpose data=tall out=normal(drop=_name_);
  by subjid date;
  id varname;
  var value;
run;

data want;
  merge all_dates normal;
  by subjid date;
run;

proc print;
run;

Result

Obs    subjid          date     C1     C2      C3

  1       1      2000-01-03    0.5     .       .
  2       1      2000-01-04    0.6     .       .
  3       1      2000-01-05    0.7     .       .
  4       1      2000-01-06     .      .       .
  5       1      2000-01-07     .      .       .
  6       1      2000-01-08     .     0.04     .
  7       1      2000-01-09     .     0.07     .
  8       1      2000-01-10     .      .      0.28
  9       1      2000-01-11     .      .      0.15
 10       1      2000-01-12     .      .       .
 11       1      2000-01-13     .      .       .
 12       1      2000-01-14     .      .      0.28

 

Tom
Super User Tom
Super User

With a little bit of "wallpaper code" you could go from your original to the rational layout in one step (assuming the non missing dates in each column are sorted).

data want;
  merge have(keep=subjid date_all rename=(date_all=date))
        have(keep=subjid date_c1 c1 rename=(date_c1=date) where=(not missing(date)))
        have(keep=subjid date_c2 c2 rename=(date_c2=date) where=(not missing(date)))
        have(keep=subjid date_c3 c3 rename=(date_c3=date) where=(not missing(date)))
  ;
  by subjid date;
run;
Thomas_mp
Obsidian | Level 7

Thank you Paul, This works.

One more question taht you very likely know how to answer.

Your input code to read the SAS dataset is:

 

input date_all :mmddyy. (date_c1 C1 date_c2 C2 date_c3 C3 date_c4 C4 ) (:mmddyy10. :32.);

You had 3 variables (C1,C2, C3), I just added one more to see if I could run your code with one more variable, and I did.  But I need to add 470 more variables C1., C2,......C470 and the corresponding dates     date_c1 ... date_c470

Do you know how to write a simple  input statement to read the 470 variables and dates ?

Thank you again.

Tomas

 

Patrick
Opal | Level 21

@Thomas_mp What you share as a screenshot as your HAVE data differs from what you shared in your Excel as HAVE data. 

The two SAS data steps I've posted earlier for HAVE and Desired create the data you shared in your Excel. Are you now telling us this is not the right data? If so then please share sample data via working SAS data step code with datalines statement so that we've all got the same and a consistent set to work with.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 1251 views
  • 0 likes
  • 4 in conversation