BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Kels123
Quartz | Level 8

Hello,

 

I have a database where subjects have measurements taken over time. Each time they change environments (SCHOOL vs. HOME) on a certain unique date (StartDate), these next set of measurements are placed on the next row. So for subject STUDYID 0001, he has four rows because he changes environments three times.  The next subject 0002 also have four measurements, but other subjects might have only two, or six, or eight measurements; there is always an even number of measurements since the first observation is always a SCHOOL period (row #1) that is always followed by a HOME period (row #2).  Sample data below:

 

data tempfile;
	INPUT
		studyID$4.@+1 environment$8. count 1.@+1 StartDate yymmdd10.@+1 EndDate yymmdd10.@+1 MvmtDate mmddyy10.@+1
		Movementcode 1.@+1 Movecode 1.@+1 daysbwn 5. Movementcode2 1.@+1 Movecode2 1.@+1;
	format Startdate EndDate yymmdd10.;
	datalines;
0001 SCHOOL 1 2007-01-01 2007-01-09 01/09/2007 3 2 8    . .
0001 HOME 2 2007-01-10 2009-01-02 01/02/2009 1 1 721  3 2
0001 SCHOOL 3 2009-01-03 2009-05-04 05/04/2009 3 2 121  1 1
0001 HOME 4 2009-05-05 2015-01-01 05/04/2009 3 2 1917 1 1
0002 SCHOOL 1 2007-01-01 2007-07-25 07/25/2007 1 1 205  . .
0002 HOME 2 2007-07-26 2008-02-22 02/22/2008 3 2 211  1 1 
0002 SCHOOL 3 2008-02-23 2010-06-11 06/11/2010 1 1 839  3 2 
0002 HOME 4 2010-06-12 2015-01-01 06/12/2010 8 2 1663 8 2
;
run;

proc print data=tempfile;
run;

 

 

Since the SCHOOL data and the HOME data that follows it are paired in real life, I would like to couple up these rows to facilitate analysis of a single row. Specifically, I would like to pair the preceding SCHOOL period with the HOME period that follows. So I would like rows with count 1 and count 2 to be combined into a single first row, and rows with count 3 and 4 to be combined into a single, second row. Optimally, it would look something like this:

 

STUDYID count environment_S StartDate_S EndDate_S MvmtDate_S Movementcode_S Movecode_S daysbwn_S Movementcode2_S Movecode2_S environment_H StartDate_H EndDate_H MvmtDate_H Movementcode_H Movecode_H daysbwn_H Movementcode2_H Movecode2_H

0001  1 SCHOOL  2007-01-01 2007-01-09  01/09/2007 3 2 8 . .   HOME  2007-01-10 2009-01-02 01/02/2009 1 1 721 3 2

0001  2 SCHOOL  2009-01-03 2009-05-04 05/04/2009 3 2 121 1 1   HOME  2009-05-05 2015-01-01 05/04/2009 3 2 1917 1 1

0002  1 SCHOOL  2007-01-01 2007-07-25  07/25/2007 1 1 205 . .   HOME  2007-07-26 2008-02-22 02/22/2008 3 2 211 1 1 

0002  2 SCHOOL  2008-02-23 2010-06-11 06/11/2010 1 1 839 3 2   HOME  2010-06-12 2015-01-01 06/12/2010 8 2 1663 8 2

 

Is there a way to do this? Any assistance or example code would be VERY much appreciated. Thank you very much in advance.

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
iank131
Quartz | Level 8

Using some of the code by collinelliot, if I understand what you require, then try this:

 

data tempfile;
	INPUT
		studyID $ environment $ count StartDate :yymmdd10. EndDate :yymmdd10. MvmtDate :mmddyy10.
		Movementcode Movecode daysbwn Movementcode2 Movecode2;
	format Startdate EndDate yymmdd10.;

	datalines;
0001 SCHOOL 1 2007-01-01 2007-01-09 01/09/2007 3 2 8    . .
0001 HOME   2 2007-01-10 2009-01-02 01/02/2009 1 1 721  3 2
0001 SCHOOL 3 2009-01-03 2009-05-04 05/04/2009 3 2 121  1 1
0001 HOME   4 2009-05-05 2015-01-01 05/04/2009 3 2 1917 1 1
0002 SCHOOL 1 2007-01-01 2007-07-25 07/25/2007 1 1 205  . .
0002 HOME   2 2007-07-26 2008-02-22 02/22/2008 3 2 211  1 1 
0002 SCHOOL 3 2008-02-23 2010-06-11 06/11/2010 1 1 839  3 2 
0002 HOME   4 2010-06-12 2015-01-01 06/12/2010 8 2 1663 8 2
;
run;


proc sql;
	create 		table tempfile2 as 
	select 		a.*, 
				b.environment as environment_b, b.StartDate as StartDate_b, b.EndDate as EndDate_b,
				b.MvmtDate as MvmtDate_b, b.Movementcode as Movementcode_b, b.Movecode as Movecode_b,
				b.daysbwn as daysbwn_b, b.Movementcode2 as Movementcode2_b, b.Movecode2 as Movecode2_b
	from 		tempfile as a, tempfile as b
	where 		a.studyID=b.studyID and ((a.count=1 and b.count=2) or (a.count=3 and b.count=4)); quit; 

 

View solution in original post

5 REPLIES 5
collinelliot
Barite | Level 11

A double transpose will get what you want, assuming I understood the issue (which I rarely do... 😞 😞

 

 

data tempfile;
	INPUT
		studyID $ environment $ count StartDate :yymmdd10. EndDate :yymmdd10. MvmtDate :mmddyy10.
		Movementcode Movecode daysbwn Movementcode2 Movecode2;
	format Startdate EndDate yymmdd10.;

    pair = ceil(count / 2);

	datalines;
0001 SCHOOL 1 2007-01-01 2007-01-09 01/09/2007 3 2 8    . .
0001 HOME   2 2007-01-10 2009-01-02 01/02/2009 1 1 721  3 2
0001 SCHOOL 3 2009-01-03 2009-05-04 05/04/2009 3 2 121  1 1
0001 HOME   4 2009-05-05 2015-01-01 05/04/2009 3 2 1917 1 1
0002 SCHOOL 1 2007-01-01 2007-07-25 07/25/2007 1 1 205  . .
0002 HOME   2 2007-07-26 2008-02-22 02/22/2008 3 2 211  1 1 
0002 SCHOOL 3 2008-02-23 2010-06-11 06/11/2010 1 1 839  3 2 
0002 HOME   4 2010-06-12 2015-01-01 06/12/2010 8 2 1663 8 2
;
run;

    
    
proc sort data = tempfile;
    by studyID environment pair count;
run;

proc transpose data = tempfile out = tempfile2;
    by studyID environment pair count;
    var startDate pair endDate mvmtDate;
run;

data  tempfile3;
    length _name_ $14;
    set tempfile2;

    _name_ = catx('_', _name_, substr(environment, 1, 1));
run;

proc sort data = tempfile3;
    by studyID pair;
run;

proc transpose data = tempfile3 out = tempfile4;
    by studyID pair;
    var col1;
run;
iank131
Quartz | Level 8

Using some of the code by collinelliot, if I understand what you require, then try this:

 

data tempfile;
	INPUT
		studyID $ environment $ count StartDate :yymmdd10. EndDate :yymmdd10. MvmtDate :mmddyy10.
		Movementcode Movecode daysbwn Movementcode2 Movecode2;
	format Startdate EndDate yymmdd10.;

	datalines;
0001 SCHOOL 1 2007-01-01 2007-01-09 01/09/2007 3 2 8    . .
0001 HOME   2 2007-01-10 2009-01-02 01/02/2009 1 1 721  3 2
0001 SCHOOL 3 2009-01-03 2009-05-04 05/04/2009 3 2 121  1 1
0001 HOME   4 2009-05-05 2015-01-01 05/04/2009 3 2 1917 1 1
0002 SCHOOL 1 2007-01-01 2007-07-25 07/25/2007 1 1 205  . .
0002 HOME   2 2007-07-26 2008-02-22 02/22/2008 3 2 211  1 1 
0002 SCHOOL 3 2008-02-23 2010-06-11 06/11/2010 1 1 839  3 2 
0002 HOME   4 2010-06-12 2015-01-01 06/12/2010 8 2 1663 8 2
;
run;


proc sql;
	create 		table tempfile2 as 
	select 		a.*, 
				b.environment as environment_b, b.StartDate as StartDate_b, b.EndDate as EndDate_b,
				b.MvmtDate as MvmtDate_b, b.Movementcode as Movementcode_b, b.Movecode as Movecode_b,
				b.daysbwn as daysbwn_b, b.Movementcode2 as Movementcode2_b, b.Movecode2 as Movecode2_b
	from 		tempfile as a, tempfile as b
	where 		a.studyID=b.studyID and ((a.count=1 and b.count=2) or (a.count=3 and b.count=4)); quit; 

 
PGStats
Opal | Level 21

Try something like:

 

data tempfile;
	INPUT
		studyID:$4. environment:$8. count StartDate :yymmdd10. EndDate :yymmdd10. 
        MvmtDate :mmddyy10.
		Movementcode Movecode daysbwn Movementcode2 Movecode2;
	format Startdate EndDate yymmdd10.;
	datalines;
0001 SCHOOL 1 2007-01-01 2007-01-09 01/09/2007 3 2 8    . .
0001 HOME 2 2007-01-10 2009-01-02 01/02/2009 1 1 721  3 2
0001 SCHOOL 3 2009-01-03 2009-05-04 05/04/2009 3 2 121  1 1
0001 HOME 4 2009-05-05 2015-01-01 05/04/2009 3 2 1917 1 1
0002 SCHOOL 1 2007-01-01 2007-07-25 07/25/2007 1 1 205  . .
0002 HOME 2 2007-07-26 2008-02-22 02/22/2008 3 2 211  1 1 
0002 SCHOOL 3 2008-02-23 2010-06-11 06/11/2010 1 1 839  3 2 
0002 HOME 4 2010-06-12 2015-01-01 06/12/2010 8 2 1663 8 2
;

data t2;
environment_S = "SCHOOL";
environment_H = "HOME";
length var $32;
line + 1;
do suffix = "_S", "_H";
    set tempfile;
    array v{*} startdate -- movecode2;
    do i = 1 to dim(v);
        var = cats(vname(v{i}), suffix);
        value = v{i};
        output;
        end;
    end;
keep line studyid  environment_: var value;
run;

proc transpose data=t2 out=t3(drop=_name_ rename=line=count);
by studyid line environment_: notsorted;
id var;
var value;
run;

data want;
retain STUDYID count environment_S StartDate_S EndDate_S MvmtDate_S Movementcode_S 
Movecode_S daysbwn_S Movementcode2_S Movecode2_S environment_H StartDate_H 
EndDate_H MvmtDate_H Movementcode_H Movecode_H daysbwn_H Movementcode2_H Movecode2_H;
set t3;
format startdate_: enddate_: mvmtdate_: yymmdd10.;
run;
PG
Kels123
Quartz | Level 8

Hi PGStats,

 

It looks like your code will do the trick, but when I apply it to my actual dataset, I am getting the error message: 

ERROR: All variables in array list must be the same type, i.e., all numeric or character.

 

How would you modify your code to address this? Do I need to create two separate arrays for my numeric and character variables? What would that look like? 

 

Thank you very much,

 

Kelsey

PGStats
Opal | Level 21

My code cannot be easily adapted for a mix of character and numeric variables. I assumed all numbers (except for studyId and environment) as per your example data.

PG

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 1341 views
  • 4 likes
  • 4 in conversation