Solved: Re: Proc Transpose - Page 4

Vince28_Statcan · Posted 08-22-2013 11:34 AM

A not so simple alternative to achieve desired results:

options datestyle=DMY;
data have;
input @1 id 3. @5 flag $3. @9 time anydtdtm15. @25 cnt 2.;
format time datetime18.;
datalines;
101 IN     04Sep1989:7:30    1
101 IN     04Sep1989:13:45 2
101 IN     21SEP1989:17:55 3
101 OUT 05SEP1989:7:15   1
101 OUT 22SEP1989:06:00 2
;
run;

proc sql;
create table step1 as
select t1.id, t1.time as indate format=datetime18., t1.cnt as incnt, t2.time as outdate format=datetime18., t2.cnt as outcnt
from have(where=(flag="IN")) as t1
left join
have(where=(flag="OUT")) as t2
on t1.id=t2.id
group by t1.id, t2.cnt
having (t2.time-t1.time GT 0) and (t2.time-t1.time=min(
               (case
               when t2.time-t1.time LT 0 then 99999999
               else t2.time-t1.time
               end
               )
              )
         )

;

create table step2 as
select coalesce(t1.id, t2.id) as id, coalesce(t1.indate, t2.indate) as indate format=datetime18., coalesce(t1.incnt, t2.incnt) as incnt, t1.outdate format=datetime18., t1.outcnt
from step1 as t1
full join
have(where=(flag="IN") rename=(time=indate cnt=incnt)) as t2
on t1.id=t2.id and t1.indate=t2.indate and t1.incnt=t2.incnt;
quit;

This pre-processes your data to provide the best in/out pairs and you can then use proc transpose with flag and incnt to name your new variables. The 9999999 was simply to assign a large enough date value when the difference is negative that the case would never be the result of the minimum summary function.

The only possible issue that I can see is if the very last in/out pair has a missing value for out. In this case you may need to add an additionnal step to transform back date 9999999 (whatever this resolves to) to a missing value

Vincent

robertrao · Posted 08-22-2013 11:42 AM

Thanks Vince for your help.

I have no advanced knowledge on SQl so i prefer simple datasteps eventhough sql is easy and finishes in simple steps.

Thanks

Vince28_Statcan · Posted 08-22-2013 12:23 PM

I don't exactly like the resulting code compared to the SQL approach but here is how this SQL logic could be duplicated with data step syntax only. DSTEP3 dataset is identical to STEP2 dataset in the sql. From there, you can use the appropriately obtained indate/outdate with incnt to do the proc transpose

/* Step1 do a full cartesian product of multiple into multiple - too bad SAS merge can't do that... got to use hash tables
or other syntaxically difficult methods. Difference between dates were also added with the same approach as the SQL case approach */
data dstep1;
length id 8. indate 8. incnt 8. outdate 8. outcnt 8.;
if _n_=1 then do;
declare hash myhash(dataset: 'have(where=(flag="OUT") rename=(time=outdate cnt=outcnt))', multidata: "YES");
myhash.defineKey('id');
myhash.defineData('outdate', 'outcnt');
myhash.defineDone();
end;

set have(where=(flag="IN") rename=(time=indate cnt=incnt));
if myhash.find() = 0 then do;
if outdate-indate<0 then diff=99999999;
else diff=outdate-indate;
output;
do while(myhash.find_next()=0);
if outdate-indate<0 then diff=99999999;
else diff=outdate-indate;
output;
end;
end;

format indate datetime18. outdate datetime18.;
drop flag;
run;

proc sort data=dstep1;
by id outcnt diff;
run;

/* step2 */
/* Keep only the best time difference for each outcnt */
data dstep2;
set dstep1;
by id outcnt;
if first.outcnt then output;
run;

/* Step3 */
/* Rebuild the missing incnt created by step2 */
data dstep3;
length id 8. indate 8. incnt 8. outdate 8. outcnt 8.;
if _n_=1 then do;
declare hash myhash(dataset: "dstep2");
myhash.defineKey('id', 'incnt');
myhash.defineData('outdate', 'outcnt');
myhash.defineDone();
end;

set have(where=(flag="IN") rename=(time=indate cnt=incnt));
rc=myhash.find();
if outdate=99999999 then outdate=.;
output;

drop flag rc;
run;

Vincent

robertrao · Posted 08-27-2013 07:12 PM

HI Tom,

if I use the pro transposeI will have a problem

FOr example: across the data if there is only In5 and no out5 but in6 is present then we get

Out4 in5 In6........

AS you can see aboveout5 is missing.

shd have been out4 In5 out5 in6

That's the reason I was tending towards Array method.

ANyways does the Array method after the addition of extra if condition

will it also serves out repurpose???

thanks

Tom · Posted 08-27-2013 07:28 PM

That is why you need the extra data step before the PROC TRANSPOSE to assign the right pair number to each record. Then you can tell PROC TRANSPOSE to generate the resulting variable name from the FLAG variable and the new generated pair number.

robertrao · Posted 08-27-2013 10:02 PM

Hi ,

Sorry for pressing on this question again and again.

I use the following data and code to transpose and get the result.

Unfortunately, IN4 doesnot come up in the final WANT dataset(i understand that there is no IN4 but i want it to show missing value for IN4 as

IN4 OUT4

. 23SEP89:06:00:00

The below code completely eliminates IN4 from the final OUTPUT

Thanks

data have;
input ID $   flag $ time :datetime. cnt;
format time datetime.;
datalines;
101    IN        04Sep1989:7:30        1
101    IN        04Sep1989:13:45       2
101    IN        21SEP1989:17:55       3
101   OUT        05SEP1989:7:15        1
101   OUT        22SEP1989:06:00       2
101   OUT        23SEP1989:06:00      12
101    IN        24SEP1989:06:00      15
102   OUT        02JUL2013:10:15       1
102    IN        08JUL2013:12:20       3
102   OUT        12JUL2013:13:00       2
run;

proc sort;

by id time;

run;

data middle ;

set have ;

by id;

if first.id then group=0;

if first.id or flag='IN' or (flag='OUT' and flag=lag(flag)) then group+1 ;

run;

proc transpose data=middle out=want (drop=_name_);

by id;

id flag group ;

var time;

.

Tom · Posted 08-28-2013 10:43 AM

Two ways to enforce that all variables are created.

One is to post process the output of TRANSPOSE. You can find the number of groups from the middle dataset and use that. By placing the ARRAY statements before the SET statement you can insure that variables are order IN1, IN2,... OUT1,OUT2,....

proc sql noprint ; select max(group) into :ngroups separate by ' ' from middle; quit;

data want ;

array in (&maxgroup);

array out (&maxgroup);

set want;

run;

Second is to add a record (or records) to the middle dataset so that all FLAG*GROUP combinations are present.

Here is a general purpose way that adds a set of records with missing ID value and then uses WHERE clause dataset option to delete that generated ID from the final output dataset. I set the ORDER BY clause in the SQL code so that the generated variables from PROC TRANSPOSE will be IN1,OUT1,IN2,OUT2,... . You switch group and flag in the ORDER BY clause if you want all IN variables defined before all OUT variables.

proc sql ;

create table middle2 as

select * from middle

union corresponding

select ' ' as id,. as time,a.flag,b.group

from (select distinct flag from middle) a

, (select distinct group from middle) b

order by id, group, flag

;

quit;

proc transpose data=middle2 out=want2 (drop=_name_ where=(id is not null)) ;

by id;

id flag group ;

var time;

run;

Ready to join fellow brilliant minds for the SAS Hackathon?

Classroom Training Available!