I need to create 2 date variables depending on other variables. I will use variables M and P for the example. 1 is the value when they select M or P. If they select M1 they have to give date1 and so on. They will never give 2 types of date so if they choose M1 date1 will only be for that. Choosing M1 will not allow them to choose P1 and so on.
Because each id can have more than 1 date coded and I need to transpose the file, I was renaming them first using if then statements then transposing and renaming the new 2 date variables on the merge. What is the best way to do this? What am I missing?
The data I have looks like this:
ID date1 date2 date3 date4 date5 date6 M1 M2 M3 M4 M5 M6 P1 P2 P3 P4 P5 P6
1 . date . date . date . 1 . . . 1 . . . 1 . .
2 date . date . . date 1 . 1 . . . . . . . . 1
etc
ID 1 for example: since date 2 and 6 are linked to M2 and M6 they would be renamed Mdate2 Mdate6 and since date4 is related to P4 it would be rename Pdate4 so the transpose can assign them to the correct date variable.
I wanted to have this:
ID Mdate Pdate
1 Mdate2 Pdate4
1 Mdate6 .
2 Mdate1 Pdate6
2 Mdate3 .
I was trying if then statements to prepare for the transpose
if M1 = 1 then Mdate1 = date1;
else if M2 = 1 then Mdate2 = date2 ;
else if M3 = 1 then Mdate3 = date3 ;
else if M4 = 1 then Mdate4 = date4 ; etc
else if P1 = 1 then Pdate1 = date1;
else if P2 = 1 then Pdate2 = date2 ;
else if P3 = 1 then Pdate3 = date3 ;
else if P4 = 1 then Pdate4 = date4 ; etc
This is not working
but my next step would be:
proc transpose data=Main
out=out1;
var Mdate1 – Mdate6;
by id;
Run;
proc transpose data=Main
out=out2;
var Pdate1 – Pdate6;
by id;
Run;
Data want;
Merge
Out1 (Rename = (COL1 = Mdate) where=(not missing(Mdate)))
Out2(Rename = (COL1 =Pdate) where=(not missing(Pdate)));
By id;
Run;
How can I do this?
Thank you.
Hello @Mscarboncopy,
I think you can create the WANT dataset in a single DATA step:
/* Create sample data for demonstration */
data have(drop=j);
call streaminit(3141592);
ID=.;
array date[6];
array MP[2,6] M1-M6 P1-P6;
do ID=1 to 10;
call missing(of date[*], of MP[*]);
do j=1 to dim(date);
if rand('bern',.5) then do;
date[j]=rand('integer',20000,22400);
MP[rand('table',.5),j]=1;
end;
end;
output;
end;
format date: date9.;
run;
/* Write dates to variables Mdate and Pdate */
data want(keep=ID Mdate Pdate);
set have;
array d[0:6] date0-date6;
array M[6];
array P[6];
*if ~n(of d[*]) then output; /* optional */
do i=1 to 6;
j=whichn(1, of M[*]);
k=whichn(1, of P[*]);
if j then do;
Mdate=d[j];
M[j]=.;
end;
if k then do;
Pdate=d[k];
P[k]=.;
end;
if n(Mdate,Pdate) then output;
call missing(Mdate,Pdate);
end;
format Mdate Pdate date9.;
run;
The "optional" OUTPUT statement would write an observation with missing Mdate and Pdate for an ID with only missing dates.
Hello @Mscarboncopy,
I think you can create the WANT dataset in a single DATA step:
/* Create sample data for demonstration */
data have(drop=j);
call streaminit(3141592);
ID=.;
array date[6];
array MP[2,6] M1-M6 P1-P6;
do ID=1 to 10;
call missing(of date[*], of MP[*]);
do j=1 to dim(date);
if rand('bern',.5) then do;
date[j]=rand('integer',20000,22400);
MP[rand('table',.5),j]=1;
end;
end;
output;
end;
format date: date9.;
run;
/* Write dates to variables Mdate and Pdate */
data want(keep=ID Mdate Pdate);
set have;
array d[0:6] date0-date6;
array M[6];
array P[6];
*if ~n(of d[*]) then output; /* optional */
do i=1 to 6;
j=whichn(1, of M[*]);
k=whichn(1, of P[*]);
if j then do;
Mdate=d[j];
M[j]=.;
end;
if k then do;
Pdate=d[k];
P[k]=.;
end;
if n(Mdate,Pdate) then output;
call missing(Mdate,Pdate);
end;
format Mdate Pdate date9.;
run;
The "optional" OUTPUT statement would write an observation with missing Mdate and Pdate for an ID with only missing dates.
Thank you! This works perfectly.
Hi @FreelanceReinh I have an update (question) about this.
I realized that when I create the files if an ID has some missing dates Vars Mdate and Pdate (in the code below) are misplaced.
For example - I am using only Pdate here for a random ID - date3 is missing and date12 is supposed to be 1/1/03. What can I do to prevent this? The code you suggested works perfectly otherwise. No issues with Ids with all dates missing but the partially missing is the issue. It happens to all ids that have some dates missing. Thank you!
Date PDATE
3 | 1-Jan-03 |
6 | 1-Jan-03 |
9 | . |
12 | date3 should be here and 3 should be missing |
14 | . |
15 | . |
Hi @Mscarboncopy,
Are you talking about inconsistencies between the date variables date1, date2, ... and the flag variables M1, M2, ..., P1, P2, ... in your real data? Or do you mean a different numbering, with gaps, of those variables (which would necessarily affect all observations, not only some IDs)? I mean we had lots of missing values in the date and flag variables of dataset HAVE and in the date variables (Mdate, Pdate) of dataset WANT, not causing any issues. Please show a single observation (ideally including only variables we had previously, i.e., ID, date1, ..., date6, M1, ..., M6, P1, ..., P6) for which you need a different result and what this result should look like in terms of ID, Mdate and Pdate. Then I'm sure we'll find a solution.
Yes of course. Thank you for that reminder. The issue is when I merge my files.
So when I run the code to create the dates I get this for this one ID:
Pdate |
1/1/2003 |
1/1/2003 |
For the condition VAR I am using an array and using the same timing variable past. As I used for the date past.
array _pas {*} pas: ;
do PSYcondPast=1 to dim(_pas);
if _pas{PSYcondPast}>0 then output;
end;
run;
in this case my output for that ID is
PSYcondPast |
3 |
6 |
9 |
12 |
14 |
15 |
Conditions 3 - 9 - 14 and 15 had no date - but I need them in the file. It is common in this data set to have conditions with no dates.
The 2 dates you see above would have to go to keys 6 and 12 here. And I am not sure how. Thank you again.
@Mscarboncopy wrote:
The issue is when I merge my files.
What files do you merge? Can you show a small example of these and the code you use for "merging" them? Unlike the outlined solution in your initial post, my solution did not include a merge step.
@Mscarboncopy wrote:
For the condition VAR I am using an array and using the same timing variable past. As I used for the date past.
array _pas {*} pas: ;
Neither a "condition VAR" nor a "timing variable past" occurred anywhere so far. In fact, there were no variables whose names would match the pattern pas:. So, it appears that there's more going on than what we had in the beginning. But this is no problem. As a start, please modify the data lines ("1 2017/06/21 ...") in the DATA steps below so that datasets HAVE and WANT illustrate your issue with the current solution. Also feel free to add new variables to them and/or define additional input datasets (e.g., HAVE2, HAVE3, ...) in the same way which introduce the new variables (like VAR, pasxy, pasyz, etc.) you mentioned in your most recent post.
data have;
input ID (date1-date6)(:yymmdd.) M1-M6 P1-P6;
format date1-date6 yymmdd10.;
cards;
1 2017/06/21 2020/06/08 2019/08/05 2017/06/11 . . 1 1 . . . . . . 1 1 . .
;
data want;
input ID (Mdate Pdate)(:yymmdd.);
format Mdate Pdate yymmdd10.;
cards;
1 2017/06/21 2019/08/05
1 2020/06/08 2017/06/11
;
Here is my entire code:
Data ConDates (keep = ID Dtepast Dte);
Set have;
array d[0:16] date0-date16;
array pas[16];
array dur[16];
*if ~n(of d[*]) then output; /* optional */
do i=1 to 16;
j=whichn(1, of pas[*]);
k=whichn(1, of dur[*]);
if j then do;
Dtepast=d[j];
pas[j]=.;
end;
if k then do;
Dte=d[k];
dur[k]=.;
end;
if n(Dtepast, Dte) then output;
call missing(Dtepast,Dte);
end;
format Dtepast Dte mmddyy10.;
run;
I am using the same data file have here and I am giving the example for only Past condition I do the same for present (dur). note how pas is the same used above, pas is a timing variable they answer so if the condition is in the past they select it and then give dates or leave dates blank:
This is the file I create with the keys for the conditions;
Data pastcondition (keep= ID condPast);
Set have;
array _pas {*} pas: ;
do condPast=1 to dim(_pas);
if _pas{condPast}>0 then output;
end;
run;
* merging the two (date file and condition file) after proc sorting both by Id;
Data all;
Merge ConDates pastcondition ;
by ID;
output;
call missing(of _all_);
Run;
This is when I get all the conditions and the 2 dates (example I gave) but the dates are not lining up with the condition they belong to, due to the missing dates.
Thank you for your patience.
Thanks for the clarification. So, the issue is that some of the dates date1-date16 are missing, yet they are of interest because non-missing flags (like pas1, pas2, ... or dur1, dur2, ...) "point" to them. Your current approach is:
However, the missing dates hamper step 3.
I would rather integrate steps 1 and 2 into the previous solution so that step 3 is no longer needed.
Let me first provide sample data including missing dates with corresponding non-missing flags.
data have;
input ID (date1-date16)(:yymmdd.) pas1-pas16 dur1-dur16;
format date1-date16 yymmdd10.;
cards;
1 . . . 2003/01/01 2003/02/02 . . 2004/11/11 . . . 2005/05/05 . . . . . 1 . . 1 . 1 . . . . 1 . . . 1 . . . 1 . . . 1 . 1 . . . . 1 .
;
Modification of my previous solution to accommodate the new missing dates (and using the new variable names and array dimensions):
data ConDates (keep = ID Dtepast Dte j k rename=(j=condPast k=condPres));
set have;
array d[0:16] date0-date16;
array pas[16];
array dur[16];
do i=1 to 16;
j=whichn(1, of pas[*]);
k=whichn(1, of dur[*]);
if j then do;
Dtepast=d[j];
pas[j]=.;
end;
else j=.;
if k then do;
Dte=d[k];
dur[k]=.;
end;
else k=.;
if j | k then output;
call missing(Dtepast,Dte);
end;
format Dtepast Dte mmddyy10.;
run;
Note that the OUTPUT statement has a new IF condition, now relying on the flags being set rather than the dates not being missing.
Result:
cond cond Obs ID Past Pres Dtepast Dte 1 1 2 4 . 01/01/2003 2 1 5 8 02/02/2003 11/11/2004 3 1 7 10 . . 4 1 12 15 05/05/2005 . 5 1 16 . . .
Now we can see from the first observation of dataset ConDates that
Observations 2, 3 and 4 have analogous interpretations. Observation 5 is special (and different from obs. 3) in that the missing value of Dte does not reflect some missing date[i], but the fact that only four dur flags were set for ID=1. We can see the difference in the cond... variables: condPres is missing, whereas condPast=16 corresponds to pas[16]=1, while date[16] happened to be missing, as shown by the missing value of variable Dtepast.
Thank you so much! This has been a learning experience for sure. It is the solution. Thank you again.
Using Freelance's dataset.
data have(drop=j); call streaminit(3141592); ID=.; array date[6]; array MP[2,6] M1-M6 P1-P6; do ID=1 to 10; call missing(of date[*], of MP[*]); do j=1 to dim(date); if rand('bern',.5) then do; date[j]=rand('integer',20000,22400); MP[rand('table',.5),j]=1; end; end; output; end; format date: date9.; run; proc transpose data=have out=temp; by id; var M: P:; run; data temp1; set temp(where=(col1 is not missing)); length _id _name $ 40; _name=prxchange('s/(M|P)/\1date/i',1,_name_); _id=substr(_name,1,5); run; data want; merge temp1(where=(upcase(_id)='MDATE') rename=(_name=mdate)) temp1(where=(upcase(_id)='PDATE') rename=(_name=pdate)); by id; output; call missing(of _all_); drop _name_ _id col1; run;
I don't know how to interpret your desired output.
ID Mdate Pdate 1 Mdate2 Pdate4 1 Mdate6 . 2 Mdate1 Pdate6 2 Mdate3 .
Why are the value MDATE2 and PDATE4 on the same observation? How are they related?
Wouldn't it make more sense to have:
ID DateNo Mdate Pdate
1 2 Mdate2 .
1 4 . Pdate4
1 6 Mdate6 .
2 1 Mdate1 .
2 3 Mdate3 .
2 6 . Pdate6
Yes, that is exactly how I need to have it.... But I actually have 2 variables instead of only one that you called DateNo. I have a Pastcondition and a presentcondition and they match the pastdate and the dte, respectively. The first time I asked about it, I was only thinking about the dates (which I called Pdate and Mdate) and then realized it was giving me issues when I was merging with the condition variables that I had to create. I will try what was suggested above as far as my code today and see. Thank you.
Does the index into the array have any meaning? Perhaps the first set of variables are related to a specific condition and the second set a different type of condition?
If so then perhaps the PAST/PRESENT CONDITION values should be one of the numbers from 1 to N.
do index=1 to dim(dates);
if p[index] then do;
pastcondition=index;
...
If not then just make them have to same binary true/false values you have the arrays.
do index=1 to dim(dates);
pastcondition=p[index];
...
The conditions are the same, I just need 2 specific variables to make analysis easier, separating them into past and present. I used FreelanceReinhard's suggestion and it worked really well. Thank you.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.