DATA Step, Macro, Functions and more

How to combine certain select rows (but not all rows) into single rows based on StudyID and count

Accepted Solution Solved
Reply
Contributor
Posts: 48
Accepted Solution

How to combine certain select rows (but not all rows) into single rows based on StudyID and count

[ Edited ]

Hello,

 

I have a database where subjects have measurements taken over time. Each time they change environments (SCHOOL vs. HOME) on a certain unique date (StartDate), these next set of measurements are placed on the next row. So for subject STUDYID 0001, he has four rows because he changes environments three times.  The next subject 0002 also have four measurements, but other subjects might have only two, or six, or eight measurements; there is always an even number of measurements since the first observation is always a SCHOOL period (row #1) that is always followed by a HOME period (row #2).  Sample data below:

 

data tempfile;
	INPUT
		studyID$4.@+1 environment$8. count 1.@+1 StartDate yymmdd10.@+1 EndDate yymmdd10.@+1 MvmtDate mmddyy10.@+1
		Movementcode 1.@+1 Movecode 1.@+1 daysbwn 5. Movementcode2 1.@+1 Movecode2 1.@+1;
	format Startdate EndDate yymmdd10.;
	datalines;
0001 SCHOOL 1 2007-01-01 2007-01-09 01/09/2007 3 2 8    . .
0001 HOME 2 2007-01-10 2009-01-02 01/02/2009 1 1 721  3 2
0001 SCHOOL 3 2009-01-03 2009-05-04 05/04/2009 3 2 121  1 1
0001 HOME 4 2009-05-05 2015-01-01 05/04/2009 3 2 1917 1 1
0002 SCHOOL 1 2007-01-01 2007-07-25 07/25/2007 1 1 205  . .
0002 HOME 2 2007-07-26 2008-02-22 02/22/2008 3 2 211  1 1 
0002 SCHOOL 3 2008-02-23 2010-06-11 06/11/2010 1 1 839  3 2 
0002 HOME 4 2010-06-12 2015-01-01 06/12/2010 8 2 1663 8 2
;
run;

proc print data=tempfile;
run;

 

 

Since the SCHOOL data and the HOME data that follows it are paired in real life, I would like to couple up these rows to facilitate analysis of a single row. Specifically, I would like to pair the preceding SCHOOL period with the HOME period that follows. So I would like rows with count 1 and count 2 to be combined into a single first row, and rows with count 3 and 4 to be combined into a single, second row. Optimally, it would look something like this:

 

STUDYID count environment_S StartDate_S EndDate_S MvmtDate_S Movementcode_S Movecode_S daysbwn_S Movementcode2_S Movecode2_S environment_H StartDate_H EndDate_H MvmtDate_H Movementcode_H Movecode_H daysbwn_H Movementcode2_H Movecode2_H

0001  1 SCHOOL  2007-01-01 2007-01-09  01/09/2007 3 2 8 . .   HOME  2007-01-10 2009-01-02 01/02/2009 1 1 721 3 2

0001  2 SCHOOL  2009-01-03 2009-05-04 05/04/2009 3 2 121 1 1   HOME  2009-05-05 2015-01-01 05/04/2009 3 2 1917 1 1

0002  1 SCHOOL  2007-01-01 2007-07-25  07/25/2007 1 1 205 . .   HOME  2007-07-26 2008-02-22 02/22/2008 3 2 211 1 1 

0002  2 SCHOOL  2008-02-23 2010-06-11 06/11/2010 1 1 839 3 2   HOME  2010-06-12 2015-01-01 06/12/2010 8 2 1663 8 2

 

Is there a way to do this? Any assistance or example code would be VERY much appreciated. Thank you very much in advance.

 

 


Accepted Solutions
Solution
‎02-07-2017 07:51 AM
Contributor
Posts: 53

Re: How to combine certain select rows (but not all rows) into single rows based on StudyID and coun

Using some of the code by collinelliot, if I understand what you require, then try this:

 

data tempfile;
	INPUT
		studyID $ environment $ count StartDate :yymmdd10. EndDate :yymmdd10. MvmtDate :mmddyy10.
		Movementcode Movecode daysbwn Movementcode2 Movecode2;
	format Startdate EndDate yymmdd10.;

	datalines;
0001 SCHOOL 1 2007-01-01 2007-01-09 01/09/2007 3 2 8    . .
0001 HOME   2 2007-01-10 2009-01-02 01/02/2009 1 1 721  3 2
0001 SCHOOL 3 2009-01-03 2009-05-04 05/04/2009 3 2 121  1 1
0001 HOME   4 2009-05-05 2015-01-01 05/04/2009 3 2 1917 1 1
0002 SCHOOL 1 2007-01-01 2007-07-25 07/25/2007 1 1 205  . .
0002 HOME   2 2007-07-26 2008-02-22 02/22/2008 3 2 211  1 1 
0002 SCHOOL 3 2008-02-23 2010-06-11 06/11/2010 1 1 839  3 2 
0002 HOME   4 2010-06-12 2015-01-01 06/12/2010 8 2 1663 8 2
;
run;


proc sql;
	create 		table tempfile2 as 
	select 		a.*, 
				b.environment as environment_b, b.StartDate as StartDate_b, b.EndDate as EndDate_b,
				b.MvmtDate as MvmtDate_b, b.Movementcode as Movementcode_b, b.Movecode as Movecode_b,
				b.daysbwn as daysbwn_b, b.Movementcode2 as Movementcode2_b, b.Movecode2 as Movecode2_b
	from 		tempfile as a, tempfile as b
	where 		a.studyID=b.studyID and ((a.count=1 and b.count=2) or (a.count=3 and b.count=4)); quit; 

 

View solution in original post


All Replies
PROC Star
Posts: 290

Re: How to combine certain select rows (but not all rows) into single rows based on StudyID and coun

A double transpose will get what you want, assuming I understood the issue (which I rarely do... :-( ):

 

 

data tempfile;
	INPUT
		studyID $ environment $ count StartDate :yymmdd10. EndDate :yymmdd10. MvmtDate :mmddyy10.
		Movementcode Movecode daysbwn Movementcode2 Movecode2;
	format Startdate EndDate yymmdd10.;

    pair = ceil(count / 2);

	datalines;
0001 SCHOOL 1 2007-01-01 2007-01-09 01/09/2007 3 2 8    . .
0001 HOME   2 2007-01-10 2009-01-02 01/02/2009 1 1 721  3 2
0001 SCHOOL 3 2009-01-03 2009-05-04 05/04/2009 3 2 121  1 1
0001 HOME   4 2009-05-05 2015-01-01 05/04/2009 3 2 1917 1 1
0002 SCHOOL 1 2007-01-01 2007-07-25 07/25/2007 1 1 205  . .
0002 HOME   2 2007-07-26 2008-02-22 02/22/2008 3 2 211  1 1 
0002 SCHOOL 3 2008-02-23 2010-06-11 06/11/2010 1 1 839  3 2 
0002 HOME   4 2010-06-12 2015-01-01 06/12/2010 8 2 1663 8 2
;
run;

    
    
proc sort data = tempfile;
    by studyID environment pair count;
run;

proc transpose data = tempfile out = tempfile2;
    by studyID environment pair count;
    var startDate pair endDate mvmtDate;
run;

data  tempfile3;
    length _name_ $14;
    set tempfile2;

    _name_ = catx('_', _name_, substr(environment, 1, 1));
run;

proc sort data = tempfile3;
    by studyID pair;
run;

proc transpose data = tempfile3 out = tempfile4;
    by studyID pair;
    var col1;
run;
Solution
‎02-07-2017 07:51 AM
Contributor
Posts: 53

Re: How to combine certain select rows (but not all rows) into single rows based on StudyID and coun

Using some of the code by collinelliot, if I understand what you require, then try this:

 

data tempfile;
	INPUT
		studyID $ environment $ count StartDate :yymmdd10. EndDate :yymmdd10. MvmtDate :mmddyy10.
		Movementcode Movecode daysbwn Movementcode2 Movecode2;
	format Startdate EndDate yymmdd10.;

	datalines;
0001 SCHOOL 1 2007-01-01 2007-01-09 01/09/2007 3 2 8    . .
0001 HOME   2 2007-01-10 2009-01-02 01/02/2009 1 1 721  3 2
0001 SCHOOL 3 2009-01-03 2009-05-04 05/04/2009 3 2 121  1 1
0001 HOME   4 2009-05-05 2015-01-01 05/04/2009 3 2 1917 1 1
0002 SCHOOL 1 2007-01-01 2007-07-25 07/25/2007 1 1 205  . .
0002 HOME   2 2007-07-26 2008-02-22 02/22/2008 3 2 211  1 1 
0002 SCHOOL 3 2008-02-23 2010-06-11 06/11/2010 1 1 839  3 2 
0002 HOME   4 2010-06-12 2015-01-01 06/12/2010 8 2 1663 8 2
;
run;


proc sql;
	create 		table tempfile2 as 
	select 		a.*, 
				b.environment as environment_b, b.StartDate as StartDate_b, b.EndDate as EndDate_b,
				b.MvmtDate as MvmtDate_b, b.Movementcode as Movementcode_b, b.Movecode as Movecode_b,
				b.daysbwn as daysbwn_b, b.Movementcode2 as Movementcode2_b, b.Movecode2 as Movecode2_b
	from 		tempfile as a, tempfile as b
	where 		a.studyID=b.studyID and ((a.count=1 and b.count=2) or (a.count=3 and b.count=4)); quit; 

 
Respected Advisor
Posts: 4,659

Re: How to combine certain select rows (but not all rows) into single rows based on StudyID and coun

Try something like:

 

data tempfile;
	INPUT
		studyID:$4. environment:$8. count StartDate :yymmdd10. EndDate :yymmdd10. 
        MvmtDate :mmddyy10.
		Movementcode Movecode daysbwn Movementcode2 Movecode2;
	format Startdate EndDate yymmdd10.;
	datalines;
0001 SCHOOL 1 2007-01-01 2007-01-09 01/09/2007 3 2 8    . .
0001 HOME 2 2007-01-10 2009-01-02 01/02/2009 1 1 721  3 2
0001 SCHOOL 3 2009-01-03 2009-05-04 05/04/2009 3 2 121  1 1
0001 HOME 4 2009-05-05 2015-01-01 05/04/2009 3 2 1917 1 1
0002 SCHOOL 1 2007-01-01 2007-07-25 07/25/2007 1 1 205  . .
0002 HOME 2 2007-07-26 2008-02-22 02/22/2008 3 2 211  1 1 
0002 SCHOOL 3 2008-02-23 2010-06-11 06/11/2010 1 1 839  3 2 
0002 HOME 4 2010-06-12 2015-01-01 06/12/2010 8 2 1663 8 2
;

data t2;
environment_S = "SCHOOL";
environment_H = "HOME";
length var $32;
line + 1;
do suffix = "_S", "_H";
    set tempfile;
    array v{*} startdate -- movecode2;
    do i = 1 to dim(v);
        var = cats(vname(v{i}), suffix);
        value = v{i};
        output;
        end;
    end;
keep line studyid  environment_: var value;
run;

proc transpose data=t2 out=t3(drop=_name_ rename=line=count);
by studyid line environment_: notsorted;
id var;
var value;
run;

data want;
retain STUDYID count environment_S StartDate_S EndDate_S MvmtDate_S Movementcode_S 
Movecode_S daysbwn_S Movementcode2_S Movecode2_S environment_H StartDate_H 
EndDate_H MvmtDate_H Movementcode_H Movecode_H daysbwn_H Movementcode2_H Movecode2_H;
set t3;
format startdate_: enddate_: mvmtdate_: yymmdd10.;
run;
PG
Contributor
Posts: 48

Re: How to combine certain select rows (but not all rows) into single rows based on StudyID and coun

Hi PGStats,

 

It looks like your code will do the trick, but when I apply it to my actual dataset, I am getting the error message: 

ERROR: All variables in array list must be the same type, i.e., all numeric or character.

 

How would you modify your code to address this? Do I need to create two separate arrays for my numeric and character variables? What would that look like? 

 

Thank you very much,

 

Kelsey

Respected Advisor
Posts: 4,659

Re: How to combine certain select rows (but not all rows) into single rows based on StudyID and coun

My code cannot be easily adapted for a mix of character and numeric variables. I assumed all numbers (except for studyId and environment) as per your example data.

PG
☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 5 replies
  • 208 views
  • 4 likes
  • 4 in conversation