BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Mscarboncopy
Pyrite | Level 9

I need to create 2 date variables depending on other variables. I will use variables M and P for the example. 1 is the value when they select M or P. If they select M1 they have to give date1  and so on. They will never give 2 types of date so if they choose M1 date1 will only be for that. Choosing M1 will not allow them to choose P1 and so on.

Because each id can have more than 1 date coded and I need to transpose the file, I was renaming them first using if then statements then transposing and renaming the new 2 date variables on the merge.  What is the best way to do this? What am I missing? 

 

The data I have looks like this:

ID date1  date2  date3 date4 date5 date6           M1 M2 M3 M4 M5 M6        P1 P2 P3   P4   P5 P6

1    .       date         .         date      .   date              .      1     .      .      .  1           .    .    .     1   .      .

2    date    .         date         .        .    date           1     .     1       .      .       .         .     .   .        .      .      1

etc

ID 1 for example: since date 2 and 6 are linked to M2 and M6 they would be renamed Mdate2 Mdate6  and since date4 is related to P4 it  would be rename Pdate4 so the transpose can assign them to the correct date variable.

 

I wanted to have this:

ID Mdate         Pdate

1   Mdate2       Pdate4

1   Mdate6      .

2   Mdate1      Pdate6

2   Mdate3          .

 

 

I was trying if then statements to prepare for the transpose

if M1 = 1 then Mdate1 = date1;

else if     M2    =     1     then  Mdate2      =      date2 ;

else if     M3    =     1 then      Mdate3      =      date3 ;

else if     M4    =     1 then      Mdate4      =      date4 ;  etc

 

else if P1 = 1 then Pdate1 = date1;

else if     P2    =     1     then  Pdate2      =      date2 ;

else if     P3    =     1 then      Pdate3      =      date3 ;

else if     P4    =     1 then      Pdate4      =      date4 ;  etc

This is not working 

but my next step would be:

proc transpose data=Main

out=out1;

var Mdate1 – Mdate6;

by  id;

Run;

 

 proc transpose data=Main

out=out2;

var Pdate1 – Pdate6;

by  id;

Run;

 

Data want;

Merge

Out1 (Rename = (COL1 = Mdate) where=(not missing(Mdate)))

Out2(Rename = (COL1 =Pdate) where=(not missing(Pdate)));

By id;

Run;

How can I do this? 

Thank you.

1 ACCEPTED SOLUTION

Accepted Solutions
FreelanceReinh
Jade | Level 19

Hello @Mscarboncopy,

 

I think you can create the WANT dataset in a single DATA step:

 

/* Create sample data for demonstration */

data have(drop=j);
call streaminit(3141592);
ID=.;
array date[6];
array MP[2,6] M1-M6 P1-P6;
do ID=1 to 10;
  call missing(of date[*], of MP[*]);
  do j=1 to dim(date);
    if rand('bern',.5) then do;
      date[j]=rand('integer',20000,22400);
      MP[rand('table',.5),j]=1;
    end;
  end;
  output;
end;
format date: date9.;
run;

/* Write dates to variables Mdate and Pdate */

data want(keep=ID Mdate Pdate);
set have;
array d[0:6] date0-date6;
array M[6];
array P[6];
*if ~n(of d[*]) then output; /* optional */
do i=1 to 6;
  j=whichn(1, of M[*]);
  k=whichn(1, of P[*]);
  if j then do;
    Mdate=d[j];
    M[j]=.;
  end;
  if k then do;
    Pdate=d[k];
    P[k]=.;
  end;
  if n(Mdate,Pdate) then output;
  call missing(Mdate,Pdate);
end;
format Mdate Pdate date9.;
run;

The "optional" OUTPUT statement would write an observation with missing Mdate and Pdate for an ID with only missing dates.

 

View solution in original post

14 REPLIES 14
FreelanceReinh
Jade | Level 19

Hello @Mscarboncopy,

 

I think you can create the WANT dataset in a single DATA step:

 

/* Create sample data for demonstration */

data have(drop=j);
call streaminit(3141592);
ID=.;
array date[6];
array MP[2,6] M1-M6 P1-P6;
do ID=1 to 10;
  call missing(of date[*], of MP[*]);
  do j=1 to dim(date);
    if rand('bern',.5) then do;
      date[j]=rand('integer',20000,22400);
      MP[rand('table',.5),j]=1;
    end;
  end;
  output;
end;
format date: date9.;
run;

/* Write dates to variables Mdate and Pdate */

data want(keep=ID Mdate Pdate);
set have;
array d[0:6] date0-date6;
array M[6];
array P[6];
*if ~n(of d[*]) then output; /* optional */
do i=1 to 6;
  j=whichn(1, of M[*]);
  k=whichn(1, of P[*]);
  if j then do;
    Mdate=d[j];
    M[j]=.;
  end;
  if k then do;
    Pdate=d[k];
    P[k]=.;
  end;
  if n(Mdate,Pdate) then output;
  call missing(Mdate,Pdate);
end;
format Mdate Pdate date9.;
run;

The "optional" OUTPUT statement would write an observation with missing Mdate and Pdate for an ID with only missing dates.

 

Mscarboncopy
Pyrite | Level 9

Thank you! This works perfectly.

 

Mscarboncopy
Pyrite | Level 9

Hi @FreelanceReinh  I have an update (question) about this.

I realized that when I create the files if an ID has some missing dates Vars Mdate and Pdate (in the code below) are misplaced.

For example -  I am using only Pdate here for a random ID - date3 is missing and date12 is supposed to be 1/1/03. What can I do to prevent this? The code you suggested works perfectly otherwise. No issues with Ids with all dates missing but the partially missing is the issue. It happens to all ids that have some dates missing. Thank you!

Date     PDATE

31-Jan-03
61-Jan-03
9.
12date3 should be here and 3 should be missing
14.
15.
FreelanceReinh
Jade | Level 19

Hi @Mscarboncopy,

 

Are you talking about inconsistencies between the date variables date1, date2, ... and the flag variables M1, M2, ..., P1, P2, ... in your real data? Or do you mean a different numbering, with gaps, of those variables (which would necessarily affect all observations, not only some IDs)? I mean we had lots of missing values in the date and flag variables of dataset HAVE and in the date variables (Mdate, Pdate) of dataset WANT, not causing any issues. Please show a single observation (ideally including only variables we had previously, i.e., IDdate1, ..., date6M1, ..., M6P1, ..., P6) for which you need a different result and what this result should look like in terms of ID, Mdate and Pdate. Then I'm sure we'll find a solution.

Mscarboncopy
Pyrite | Level 9

Yes of course. Thank you for that reminder. The issue is when I merge my files.

So when I run the code to create the dates I get this for this one ID:

Pdate
1/1/2003
1/1/2003

 

For the condition VAR I am using an array and using the same timing variable past. As I used for the date past.

array _pas {*} pas: ;
do PSYcondPast=1 to dim(_pas);
if _pas{PSYcondPast}>0 then output;
end;
run;

 

in this case my output for that ID is

PSYcondPast
3
6
9
12
14
15

Conditions 3 - 9 - 14 and 15  had no date - but I need them in the file. It is common in this data set to have conditions with no dates.

The 2 dates you see above  would have to go to keys 6 and 12 here. And I am not sure how. Thank you again.

 

 
FreelanceReinh
Jade | Level 19

@Mscarboncopy wrote:

The issue is when I merge my files.

What files do you merge? Can you show a small example of these and the code you use for "merging" them? Unlike the outlined solution in your initial post, my solution did not include a merge step.

 


@Mscarboncopy wrote:

For the condition VAR I am using an array and using the same timing variable past. As I used for the date past.

array _pas {*} pas: ;


Neither a "condition VAR" nor a "timing variable past" occurred anywhere so far. In fact, there were no variables whose names would match the pattern pas:.  So, it appears that there's more going on than what we had in the beginning. But this is no problem. As a start, please modify the data lines  ("1 2017/06/21 ...") in the DATA steps below so that datasets HAVE and WANT illustrate your issue with the current solution. Also feel free to add new variables to them and/or define additional input datasets (e.g., HAVE2, HAVE3, ...) in the same way which introduce the new variables (like VAR, pasxy, pasyz, etc.) you mentioned in your most recent post.

data have;
input ID (date1-date6)(:yymmdd.) M1-M6 P1-P6;
format date1-date6 yymmdd10.;
cards;
1 2017/06/21 2020/06/08 2019/08/05 2017/06/11 . . 1 1 . . . . . . 1 1 . .
;

data want;
input ID (Mdate Pdate)(:yymmdd.);
format Mdate Pdate yymmdd10.;
cards;
1 2017/06/21 2019/08/05
1 2020/06/08 2017/06/11
;

 

Mscarboncopy
Pyrite | Level 9

Here is my entire code:

Data ConDates (keep = ID  Dtepast Dte);
Set have;
array d[0:16] date0-date16;
array pas[16];
array dur[16];
*if ~n(of d[*]) then output; /* optional */
do i=1 to 16;
j=whichn(1, of pas[*]);
k=whichn(1, of dur[*]);
if j then do;
Dtepast=d[j];
pas[j]=.;
end;
if k then do;
Dte=d[k];
dur[k]=.;
end;
if n(Dtepast, Dte) then output;
call missing(Dtepast,Dte);
end;
format Dtepast Dte mmddyy10.;
run;

 

 I am using the same data file have here and I am giving the example for only Past condition I do the same for present (dur). note how pas is the same used above, pas is a timing variable they answer so if the condition is in the past they select it and then give dates or leave dates blank:

 This is the file I create with the keys for the conditions;

 

Data pastcondition (keep= ID condPast);
Set have;
array _pas {*} pas: ;
do condPast=1 to dim(_pas);
if _pas{condPast}>0 then output;
end;
run;

 

*  merging the two (date file and condition file) after proc sorting both by Id;

 

Data all;
Merge ConDates pastcondition ;
by ID;
output;
call missing(of _all_);
Run;

 

This is when I get all the conditions and the 2 dates (example I gave) but the dates are not lining up with the condition they belong to, due to the missing dates.

Thank you for your patience.

FreelanceReinh
Jade | Level 19

Thanks for the clarification. So, the issue is that some of the dates date1-date16 are missing, yet they are of interest because non-missing flags (like pas1, pas2, ... or dur1, dur2, ...) "point" to them. Your current approach is:

  1. retrieve the dates from dataset HAVE
  2. retrieve the conditions (i.e., the indices i for which the flags pas[i] or dur[i] are set) from dataset HAVE
  3. merge the results from steps 1 and 2.

However, the missing dates hamper step 3.

 

I would rather integrate steps 1 and 2 into the previous solution so that step 3 is no longer needed.

 

Let me first provide sample data including missing dates with corresponding non-missing flags.

data have;
input ID (date1-date16)(:yymmdd.) pas1-pas16 dur1-dur16;
format date1-date16 yymmdd10.;
cards;
1 . . . 2003/01/01 2003/02/02 . . 2004/11/11 . . . 2005/05/05 . . . . . 1 . . 1 . 1 . . . . 1 . . . 1 . . . 1 . . . 1 . 1 . . . . 1 .
;

Modification of my previous solution to accommodate the new missing dates (and using the new variable names and array dimensions):

data ConDates (keep = ID  Dtepast Dte j k rename=(j=condPast k=condPres));
set have;
array d[0:16] date0-date16;
array pas[16];
array dur[16];
do i=1 to 16;
  j=whichn(1, of pas[*]);
  k=whichn(1, of dur[*]);
  if j then do;
    Dtepast=d[j];
    pas[j]=.;
  end;
  else j=.;
  if k then do;
    Dte=d[k];
    dur[k]=.;
  end;
  else k=.;
  if j | k then output;
  call missing(Dtepast,Dte);
end;
format Dtepast Dte mmddyy10.;
run;

Note that the OUTPUT statement has a new IF condition, now relying on the flags being set rather than the dates not being missing.

 

Result:

             cond    cond
Obs    ID    Past    Pres       Dtepast           Dte

 1      1      2       4              .    01/01/2003
 2      1      5       8     02/02/2003    11/11/2004
 3      1      7      10              .             .
 4      1     12      15     05/05/2005             .
 5      1     16       .              .             .

Now we can see from the first observation of dataset ConDates that

  1. flag pas[2] was set, but date[2] was missing -- one of the "new" missing dates
  2. flag dur[4] was set and date[4] was 01/01/2003.

Observations 2, 3 and 4 have analogous interpretations. Observation 5 is special (and different from obs. 3) in that the missing value of Dte does not reflect some missing date[i], but the fact that only four dur flags were set for ID=1. We can see the difference in the cond... variables: condPres is missing, whereas condPast=16 corresponds to pas[16]=1, while date[16] happened to be missing, as shown by the missing value of variable Dtepast.

 

Mscarboncopy
Pyrite | Level 9

Thank you so much! This has been a learning experience for sure. It is the solution. Thank you again.

Ksharp
Super User

Using Freelance's dataset.

 

data have(drop=j);
call streaminit(3141592);
ID=.;
array date[6];
array MP[2,6] M1-M6 P1-P6;
do ID=1 to 10;
  call missing(of date[*], of MP[*]);
  do j=1 to dim(date);
    if rand('bern',.5) then do;
      date[j]=rand('integer',20000,22400);
      MP[rand('table',.5),j]=1;
    end;
  end;
  output;
end;
format date: date9.;
run;


proc transpose data=have out=temp;
by id;
var M: P:;
run;
data temp1;
 set temp(where=(col1 is not missing));
 length _id _name $ 40;
 _name=prxchange('s/(M|P)/\1date/i',1,_name_);
 _id=substr(_name,1,5);
run;
data want;
 merge temp1(where=(upcase(_id)='MDATE') rename=(_name=mdate))
       temp1(where=(upcase(_id)='PDATE') rename=(_name=pdate));
 by id;
 output;
 call missing(of _all_);
 drop _name_ _id col1;
run;
Tom
Super User Tom
Super User

I don't know how to interpret your desired output.

ID Mdate         Pdate
1   Mdate2       Pdate4
1   Mdate6      .
2   Mdate1      Pdate6
2   Mdate3          .

Why are the value MDATE2 and PDATE4 on the same observation?  How are they related?

Wouldn't it make more sense to have:

ID DateNo Mdate  Pdate
1    2   Mdate2    .
1    4      .    Pdate4
1    6   Mdate6    .
2    1   Mdate1    .
2    3   Mdate3    .
2    6     .     Pdate6
Mscarboncopy
Pyrite | Level 9

Yes, that is exactly how I need to have it.... But I actually have 2 variables instead of only one that you called DateNo. I have a Pastcondition and a presentcondition and they match the pastdate and the dte, respectively. The first time I asked about it, I was only thinking about the dates (which I called Pdate and Mdate) and then realized it was giving me issues when I was merging with the condition variables that I had to create. I will try what was suggested above as far as my code today and see. Thank you.

Tom
Super User Tom
Super User

Does the index into the array have any meaning?  Perhaps the first set of variables are related to a specific condition and the second set a different type of condition? 

 

If so then perhaps the PAST/PRESENT CONDITION values should be one of the numbers from 1 to N.

do index=1 to dim(dates);
   if p[index] then do;
       pastcondition=index;
...

If not then just make them have to same binary true/false values you have the arrays.

do index=1 to dim(dates);
    pastcondition=p[index];
...
Mscarboncopy
Pyrite | Level 9

The conditions are the same, I just need 2 specific variables to make analysis easier, separating them into past and present. I used FreelanceReinhard's suggestion and it worked really well. Thank you.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 14 replies
  • 1108 views
  • 5 likes
  • 4 in conversation