Hello,
I have a database where subjects have measurements taken over time. Each time they change environments (SCHOOL vs. HOME) on a certain unique date (StartDate), these next set of measurements are placed on the next row. So for subject STUDYID 0001, he has four rows because he changes environments three times. The next subject 0002 also have four measurements, but other subjects might have only two, or six, or eight measurements; there is always an even number of measurements since the first observation is always a SCHOOL period (row #1) that is always followed by a HOME period (row #2). Sample data below:
data tempfile;
INPUT
studyID$4.@+1 environment$8. count 1.@+1 StartDate yymmdd10.@+1 EndDate yymmdd10.@+1 MvmtDate mmddyy10.@+1
Movementcode 1.@+1 Movecode 1.@+1 daysbwn 5. Movementcode2 1.@+1 Movecode2 1.@+1;
format Startdate EndDate yymmdd10.;
datalines;
0001 SCHOOL 1 2007-01-01 2007-01-09 01/09/2007 3 2 8 . .
0001 HOME 2 2007-01-10 2009-01-02 01/02/2009 1 1 721 3 2
0001 SCHOOL 3 2009-01-03 2009-05-04 05/04/2009 3 2 121 1 1
0001 HOME 4 2009-05-05 2015-01-01 05/04/2009 3 2 1917 1 1
0002 SCHOOL 1 2007-01-01 2007-07-25 07/25/2007 1 1 205 . .
0002 HOME 2 2007-07-26 2008-02-22 02/22/2008 3 2 211 1 1
0002 SCHOOL 3 2008-02-23 2010-06-11 06/11/2010 1 1 839 3 2
0002 HOME 4 2010-06-12 2015-01-01 06/12/2010 8 2 1663 8 2
;
run;
proc print data=tempfile;
run;
Since the SCHOOL data and the HOME data that follows it are paired in real life, I would like to couple up these rows to facilitate analysis of a single row. Specifically, I would like to pair the preceding SCHOOL period with the HOME period that follows. So I would like rows with count 1 and count 2 to be combined into a single first row, and rows with count 3 and 4 to be combined into a single, second row. Optimally, it would look something like this:
STUDYID count environment_S StartDate_S EndDate_S MvmtDate_S Movementcode_S Movecode_S daysbwn_S Movementcode2_S Movecode2_S environment_H StartDate_H EndDate_H MvmtDate_H Movementcode_H Movecode_H daysbwn_H Movementcode2_H Movecode2_H
0001 1 SCHOOL 2007-01-01 2007-01-09 01/09/2007 3 2 8 . . HOME 2007-01-10 2009-01-02 01/02/2009 1 1 721 3 2
0001 2 SCHOOL 2009-01-03 2009-05-04 05/04/2009 3 2 121 1 1 HOME 2009-05-05 2015-01-01 05/04/2009 3 2 1917 1 1
0002 1 SCHOOL 2007-01-01 2007-07-25 07/25/2007 1 1 205 . . HOME 2007-07-26 2008-02-22 02/22/2008 3 2 211 1 1
0002 2 SCHOOL 2008-02-23 2010-06-11 06/11/2010 1 1 839 3 2 HOME 2010-06-12 2015-01-01 06/12/2010 8 2 1663 8 2
Is there a way to do this? Any assistance or example code would be VERY much appreciated. Thank you very much in advance.
Using some of the code by collinelliot, if I understand what you require, then try this:
data tempfile;
INPUT
studyID $ environment $ count StartDate :yymmdd10. EndDate :yymmdd10. MvmtDate :mmddyy10.
Movementcode Movecode daysbwn Movementcode2 Movecode2;
format Startdate EndDate yymmdd10.;
datalines;
0001 SCHOOL 1 2007-01-01 2007-01-09 01/09/2007 3 2 8 . .
0001 HOME 2 2007-01-10 2009-01-02 01/02/2009 1 1 721 3 2
0001 SCHOOL 3 2009-01-03 2009-05-04 05/04/2009 3 2 121 1 1
0001 HOME 4 2009-05-05 2015-01-01 05/04/2009 3 2 1917 1 1
0002 SCHOOL 1 2007-01-01 2007-07-25 07/25/2007 1 1 205 . .
0002 HOME 2 2007-07-26 2008-02-22 02/22/2008 3 2 211 1 1
0002 SCHOOL 3 2008-02-23 2010-06-11 06/11/2010 1 1 839 3 2
0002 HOME 4 2010-06-12 2015-01-01 06/12/2010 8 2 1663 8 2
;
run;
proc sql;
create table tempfile2 as
select a.*,
b.environment as environment_b, b.StartDate as StartDate_b, b.EndDate as EndDate_b,
b.MvmtDate as MvmtDate_b, b.Movementcode as Movementcode_b, b.Movecode as Movecode_b,
b.daysbwn as daysbwn_b, b.Movementcode2 as Movementcode2_b, b.Movecode2 as Movecode2_b
from tempfile as a, tempfile as b
where a.studyID=b.studyID and ((a.count=1 and b.count=2) or (a.count=3 and b.count=4)); quit;
A double transpose will get what you want, assuming I understood the issue (which I rarely do... 😞 😞
data tempfile;
INPUT
studyID $ environment $ count StartDate :yymmdd10. EndDate :yymmdd10. MvmtDate :mmddyy10.
Movementcode Movecode daysbwn Movementcode2 Movecode2;
format Startdate EndDate yymmdd10.;
pair = ceil(count / 2);
datalines;
0001 SCHOOL 1 2007-01-01 2007-01-09 01/09/2007 3 2 8 . .
0001 HOME 2 2007-01-10 2009-01-02 01/02/2009 1 1 721 3 2
0001 SCHOOL 3 2009-01-03 2009-05-04 05/04/2009 3 2 121 1 1
0001 HOME 4 2009-05-05 2015-01-01 05/04/2009 3 2 1917 1 1
0002 SCHOOL 1 2007-01-01 2007-07-25 07/25/2007 1 1 205 . .
0002 HOME 2 2007-07-26 2008-02-22 02/22/2008 3 2 211 1 1
0002 SCHOOL 3 2008-02-23 2010-06-11 06/11/2010 1 1 839 3 2
0002 HOME 4 2010-06-12 2015-01-01 06/12/2010 8 2 1663 8 2
;
run;
proc sort data = tempfile;
by studyID environment pair count;
run;
proc transpose data = tempfile out = tempfile2;
by studyID environment pair count;
var startDate pair endDate mvmtDate;
run;
data tempfile3;
length _name_ $14;
set tempfile2;
_name_ = catx('_', _name_, substr(environment, 1, 1));
run;
proc sort data = tempfile3;
by studyID pair;
run;
proc transpose data = tempfile3 out = tempfile4;
by studyID pair;
var col1;
run;
Using some of the code by collinelliot, if I understand what you require, then try this:
data tempfile;
INPUT
studyID $ environment $ count StartDate :yymmdd10. EndDate :yymmdd10. MvmtDate :mmddyy10.
Movementcode Movecode daysbwn Movementcode2 Movecode2;
format Startdate EndDate yymmdd10.;
datalines;
0001 SCHOOL 1 2007-01-01 2007-01-09 01/09/2007 3 2 8 . .
0001 HOME 2 2007-01-10 2009-01-02 01/02/2009 1 1 721 3 2
0001 SCHOOL 3 2009-01-03 2009-05-04 05/04/2009 3 2 121 1 1
0001 HOME 4 2009-05-05 2015-01-01 05/04/2009 3 2 1917 1 1
0002 SCHOOL 1 2007-01-01 2007-07-25 07/25/2007 1 1 205 . .
0002 HOME 2 2007-07-26 2008-02-22 02/22/2008 3 2 211 1 1
0002 SCHOOL 3 2008-02-23 2010-06-11 06/11/2010 1 1 839 3 2
0002 HOME 4 2010-06-12 2015-01-01 06/12/2010 8 2 1663 8 2
;
run;
proc sql;
create table tempfile2 as
select a.*,
b.environment as environment_b, b.StartDate as StartDate_b, b.EndDate as EndDate_b,
b.MvmtDate as MvmtDate_b, b.Movementcode as Movementcode_b, b.Movecode as Movecode_b,
b.daysbwn as daysbwn_b, b.Movementcode2 as Movementcode2_b, b.Movecode2 as Movecode2_b
from tempfile as a, tempfile as b
where a.studyID=b.studyID and ((a.count=1 and b.count=2) or (a.count=3 and b.count=4)); quit;
Try something like:
data tempfile;
INPUT
studyID:$4. environment:$8. count StartDate :yymmdd10. EndDate :yymmdd10.
MvmtDate :mmddyy10.
Movementcode Movecode daysbwn Movementcode2 Movecode2;
format Startdate EndDate yymmdd10.;
datalines;
0001 SCHOOL 1 2007-01-01 2007-01-09 01/09/2007 3 2 8 . .
0001 HOME 2 2007-01-10 2009-01-02 01/02/2009 1 1 721 3 2
0001 SCHOOL 3 2009-01-03 2009-05-04 05/04/2009 3 2 121 1 1
0001 HOME 4 2009-05-05 2015-01-01 05/04/2009 3 2 1917 1 1
0002 SCHOOL 1 2007-01-01 2007-07-25 07/25/2007 1 1 205 . .
0002 HOME 2 2007-07-26 2008-02-22 02/22/2008 3 2 211 1 1
0002 SCHOOL 3 2008-02-23 2010-06-11 06/11/2010 1 1 839 3 2
0002 HOME 4 2010-06-12 2015-01-01 06/12/2010 8 2 1663 8 2
;
data t2;
environment_S = "SCHOOL";
environment_H = "HOME";
length var $32;
line + 1;
do suffix = "_S", "_H";
set tempfile;
array v{*} startdate -- movecode2;
do i = 1 to dim(v);
var = cats(vname(v{i}), suffix);
value = v{i};
output;
end;
end;
keep line studyid environment_: var value;
run;
proc transpose data=t2 out=t3(drop=_name_ rename=line=count);
by studyid line environment_: notsorted;
id var;
var value;
run;
data want;
retain STUDYID count environment_S StartDate_S EndDate_S MvmtDate_S Movementcode_S
Movecode_S daysbwn_S Movementcode2_S Movecode2_S environment_H StartDate_H
EndDate_H MvmtDate_H Movementcode_H Movecode_H daysbwn_H Movementcode2_H Movecode2_H;
set t3;
format startdate_: enddate_: mvmtdate_: yymmdd10.;
run;
Hi PGStats,
It looks like your code will do the trick, but when I apply it to my actual dataset, I am getting the error message:
ERROR: All variables in array list must be the same type, i.e., all numeric or character.
How would you modify your code to address this? Do I need to create two separate arrays for my numeric and character variables? What would that look like?
Thank you very much,
Kelsey
My code cannot be easily adapted for a mix of character and numeric variables. I assumed all numbers (except for studyId and environment) as per your example data.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.