Solved: Re: How to combine certain select rows (but not all rows) into single ...

Kels123 · Posted 02-03-2017 03:06 PM

Hello,

I have a database where subjects have measurements taken over time. Each time they change environments (SCHOOL vs. HOME) on a certain unique date (StartDate), these next set of measurements are placed on the next row. So for subject STUDYID 0001, he has four rows because he changes environments three times. The next subject 0002 also have four measurements, but other subjects might have only two, or six, or eight measurements; there is always an even number of measurements since the first observation is always a SCHOOL period (row #1) that is always followed by a HOME period (row #2). Sample data below:

data tempfile;
	INPUT
		studyID$4.@+1 environment$8. count 1.@+1 StartDate yymmdd10.@+1 EndDate yymmdd10.@+1 MvmtDate mmddyy10.@+1
		Movementcode 1.@+1 Movecode 1.@+1 daysbwn 5. Movementcode2 1.@+1 Movecode2 1.@+1;
	format Startdate EndDate yymmdd10.;
	datalines;
0001 SCHOOL 1 2007-01-01 2007-01-09 01/09/2007 3 2 8    . .
0001 HOME 2 2007-01-10 2009-01-02 01/02/2009 1 1 721  3 2
0001 SCHOOL 3 2009-01-03 2009-05-04 05/04/2009 3 2 121  1 1
0001 HOME 4 2009-05-05 2015-01-01 05/04/2009 3 2 1917 1 1
0002 SCHOOL 1 2007-01-01 2007-07-25 07/25/2007 1 1 205  . .
0002 HOME 2 2007-07-26 2008-02-22 02/22/2008 3 2 211  1 1 
0002 SCHOOL 3 2008-02-23 2010-06-11 06/11/2010 1 1 839  3 2 
0002 HOME 4 2010-06-12 2015-01-01 06/12/2010 8 2 1663 8 2
;
run;

proc print data=tempfile;
run;

Since the SCHOOL data and the HOME data that follows it are paired in real life, I would like to couple up these rows to facilitate analysis of a single row. Specifically, I would like to pair the preceding SCHOOL period with the HOME period that follows. So I would like rows with count 1 and count 2 to be combined into a single first row, and rows with count 3 and 4 to be combined into a single, second row. Optimally, it would look something like this:

STUDYID count environment_S StartDate_S EndDate_S MvmtDate_S Movementcode_S Movecode_S daysbwn_S Movementcode2_S Movecode2_S environment_H StartDate_H EndDate_H MvmtDate_H Movementcode_H Movecode_H daysbwn_H Movementcode2_H Movecode2_H

0001 1 SCHOOL 2007-01-01 2007-01-09 01/09/2007 3 2 8 . . HOME 2007-01-10 2009-01-02 01/02/2009 1 1 721 3 2

0001 2 SCHOOL 2009-01-03 2009-05-04 05/04/2009 3 2 121 1 1 HOME 2009-05-05 2015-01-01 05/04/2009 3 2 1917 1 1

0002 1 SCHOOL 2007-01-01 2007-07-25 07/25/2007 1 1 205 . . HOME 2007-07-26 2008-02-22 02/22/2008 3 2 211 1 1

0002 2 SCHOOL 2008-02-23 2010-06-11 06/11/2010 1 1 839 3 2 HOME 2010-06-12 2015-01-01 06/12/2010 8 2 1663 8 2

Is there a way to do this? Any assistance or example code would be VERY much appreciated. Thank you very much in advance.

collinelliot · Posted 02-04-2017 06:26 AM

Using some of the code by collinelliot, if I understand what you require, then try this:

data tempfile;
	INPUT
		studyID $ environment $ count StartDate :yymmdd10. EndDate :yymmdd10. MvmtDate :mmddyy10.
		Movementcode Movecode daysbwn Movementcode2 Movecode2;
	format Startdate EndDate yymmdd10.;

	datalines;
0001 SCHOOL 1 2007-01-01 2007-01-09 01/09/2007 3 2 8    . .
0001 HOME   2 2007-01-10 2009-01-02 01/02/2009 1 1 721  3 2
0001 SCHOOL 3 2009-01-03 2009-05-04 05/04/2009 3 2 121  1 1
0001 HOME   4 2009-05-05 2015-01-01 05/04/2009 3 2 1917 1 1
0002 SCHOOL 1 2007-01-01 2007-07-25 07/25/2007 1 1 205  . .
0002 HOME   2 2007-07-26 2008-02-22 02/22/2008 3 2 211  1 1 
0002 SCHOOL 3 2008-02-23 2010-06-11 06/11/2010 1 1 839  3 2 
0002 HOME   4 2010-06-12 2015-01-01 06/12/2010 8 2 1663 8 2
;
run;


proc sql;
	create 		table tempfile2 as 
	select 		a.*, 
				b.environment as environment_b, b.StartDate as StartDate_b, b.EndDate as EndDate_b,
				b.MvmtDate as MvmtDate_b, b.Movementcode as Movementcode_b, b.Movecode as Movecode_b,
				b.daysbwn as daysbwn_b, b.Movementcode2 as Movementcode2_b, b.Movecode2 as Movecode2_b
	from 		tempfile as a, tempfile as b
	where 		a.studyID=b.studyID and ((a.count=1 and b.count=2) or (a.count=3 and b.count=4)); quit;

View solution in original post

collinelliot · Posted 02-03-2017 04:19 PM

A double transpose will get what you want, assuming I understood the issue (which I rarely do... 😞 😞

data tempfile;
	INPUT
		studyID $ environment $ count StartDate :yymmdd10. EndDate :yymmdd10. MvmtDate :mmddyy10.
		Movementcode Movecode daysbwn Movementcode2 Movecode2;
	format Startdate EndDate yymmdd10.;

    pair = ceil(count / 2);

	datalines;
0001 SCHOOL 1 2007-01-01 2007-01-09 01/09/2007 3 2 8    . .
0001 HOME   2 2007-01-10 2009-01-02 01/02/2009 1 1 721  3 2
0001 SCHOOL 3 2009-01-03 2009-05-04 05/04/2009 3 2 121  1 1
0001 HOME   4 2009-05-05 2015-01-01 05/04/2009 3 2 1917 1 1
0002 SCHOOL 1 2007-01-01 2007-07-25 07/25/2007 1 1 205  . .
0002 HOME   2 2007-07-26 2008-02-22 02/22/2008 3 2 211  1 1 
0002 SCHOOL 3 2008-02-23 2010-06-11 06/11/2010 1 1 839  3 2 
0002 HOME   4 2010-06-12 2015-01-01 06/12/2010 8 2 1663 8 2
;
run;

    
    
proc sort data = tempfile;
    by studyID environment pair count;
run;

proc transpose data = tempfile out = tempfile2;
    by studyID environment pair count;
    var startDate pair endDate mvmtDate;
run;

data  tempfile3;
    length _name_ $14;
    set tempfile2;

    _name_ = catx('_', _name_, substr(environment, 1, 1));
run;

proc sort data = tempfile3;
    by studyID pair;
run;

proc transpose data = tempfile3 out = tempfile4;
    by studyID pair;
    var col1;
run;

collinelliot · Posted 02-04-2017 06:26 AM

Using some of the code by collinelliot, if I understand what you require, then try this:

data tempfile;
	INPUT
		studyID $ environment $ count StartDate :yymmdd10. EndDate :yymmdd10. MvmtDate :mmddyy10.
		Movementcode Movecode daysbwn Movementcode2 Movecode2;
	format Startdate EndDate yymmdd10.;

	datalines;
0001 SCHOOL 1 2007-01-01 2007-01-09 01/09/2007 3 2 8    . .
0001 HOME   2 2007-01-10 2009-01-02 01/02/2009 1 1 721  3 2
0001 SCHOOL 3 2009-01-03 2009-05-04 05/04/2009 3 2 121  1 1
0001 HOME   4 2009-05-05 2015-01-01 05/04/2009 3 2 1917 1 1
0002 SCHOOL 1 2007-01-01 2007-07-25 07/25/2007 1 1 205  . .
0002 HOME   2 2007-07-26 2008-02-22 02/22/2008 3 2 211  1 1 
0002 SCHOOL 3 2008-02-23 2010-06-11 06/11/2010 1 1 839  3 2 
0002 HOME   4 2010-06-12 2015-01-01 06/12/2010 8 2 1663 8 2
;
run;


proc sql;
	create 		table tempfile2 as 
	select 		a.*, 
				b.environment as environment_b, b.StartDate as StartDate_b, b.EndDate as EndDate_b,
				b.MvmtDate as MvmtDate_b, b.Movementcode as Movementcode_b, b.Movecode as Movecode_b,
				b.daysbwn as daysbwn_b, b.Movementcode2 as Movementcode2_b, b.Movecode2 as Movecode2_b
	from 		tempfile as a, tempfile as b
	where 		a.studyID=b.studyID and ((a.count=1 and b.count=2) or (a.count=3 and b.count=4)); quit;

PGStats · Posted 02-04-2017 01:03 AM

Try something like:

data tempfile;
	INPUT
		studyID:$4. environment:$8. count StartDate :yymmdd10. EndDate :yymmdd10. 
        MvmtDate :mmddyy10.
		Movementcode Movecode daysbwn Movementcode2 Movecode2;
	format Startdate EndDate yymmdd10.;
	datalines;
0001 SCHOOL 1 2007-01-01 2007-01-09 01/09/2007 3 2 8    . .
0001 HOME 2 2007-01-10 2009-01-02 01/02/2009 1 1 721  3 2
0001 SCHOOL 3 2009-01-03 2009-05-04 05/04/2009 3 2 121  1 1
0001 HOME 4 2009-05-05 2015-01-01 05/04/2009 3 2 1917 1 1
0002 SCHOOL 1 2007-01-01 2007-07-25 07/25/2007 1 1 205  . .
0002 HOME 2 2007-07-26 2008-02-22 02/22/2008 3 2 211  1 1 
0002 SCHOOL 3 2008-02-23 2010-06-11 06/11/2010 1 1 839  3 2 
0002 HOME 4 2010-06-12 2015-01-01 06/12/2010 8 2 1663 8 2
;

data t2;
environment_S = "SCHOOL";
environment_H = "HOME";
length var $32;
line + 1;
do suffix = "_S", "_H";
    set tempfile;
    array v{*} startdate -- movecode2;
    do i = 1 to dim(v);
        var = cats(vname(v{i}), suffix);
        value = v{i};
        output;
        end;
    end;
keep line studyid  environment_: var value;
run;

proc transpose data=t2 out=t3(drop=_name_ rename=line=count);
by studyid line environment_: notsorted;
id var;
var value;
run;

data want;
retain STUDYID count environment_S StartDate_S EndDate_S MvmtDate_S Movementcode_S 
Movecode_S daysbwn_S Movementcode2_S Movecode2_S environment_H StartDate_H 
EndDate_H MvmtDate_H Movementcode_H Movecode_H daysbwn_H Movementcode2_H Movecode2_H;
set t3;
format startdate_: enddate_: mvmtdate_: yymmdd10.;
run;

PG

Kels123 · Posted 02-06-2017 03:58 PM

Hi PGStats,

It looks like your code will do the trick, but when I apply it to my actual dataset, I am getting the error message:

ERROR: All variables in array list must be the same type, i.e., all numeric or character.

How would you modify your code to address this? Do I need to create two separate arrays for my numeric and character variables? What would that look like?

Thank you very much,

Kelsey

PGStats · Posted 02-06-2017 04:49 PM

My code cannot be easily adapted for a mix of character and numeric variables. I assumed all numbers (except for studyId and environment) as per your example data.

PG

How to combine certain select rows (but not all rows) into single rows based on StudyID and count

Re: How to combine certain select rows (but not all rows) into single rows based on StudyID and coun

Re: How to combine certain select rows (but not all rows) into single rows based on StudyID and coun

Re: How to combine certain select rows (but not all rows) into single rows based on StudyID and coun

Re: How to combine certain select rows (but not all rows) into single rows based on StudyID and coun

Re: How to combine certain select rows (but not all rows) into single rows based on StudyID and coun

Re: How to combine certain select rows (but not all rows) into single rows based on StudyID and coun

How to combine certain select rows (but not all rows) into single rows based on StudyID and count

Re: How to combine certain select rows (but not all rows) into single rows based on StudyID and coun

Re: How to combine certain select rows (but not all rows) into single rows based on StudyID and coun

Re: How to combine certain select rows (but not all rows) into single rows based on StudyID and coun

Re: How to combine certain select rows (but not all rows) into single rows based on StudyID and coun

Re: How to combine certain select rows (but not all rows) into single rows based on StudyID and coun

Re: How to combine certain select rows (but not all rows) into single rows based on StudyID and coun

SAS Innovate 2025: Save the Date

SAS Training: Just a Click Away