A not so simple alternative to achieve desired results:
options datestyle=DMY;
data have;
input @1 id 3. @5 flag $3. @9 time anydtdtm15. @25 cnt 2.;
format time datetime18.;
datalines;
101 IN 04Sep1989:7:30 1
101 IN 04Sep1989:13:45 2
101 IN 21SEP1989:17:55 3
101 OUT 05SEP1989:7:15 1
101 OUT 22SEP1989:06:00 2
;
run;
proc sql;
create table step1 as
select t1.id, t1.time as indate format=datetime18., t1.cnt as incnt, t2.time as outdate format=datetime18., t2.cnt as outcnt
from have(where=(flag="IN")) as t1
left join
have(where=(flag="OUT")) as t2
on t1.id=t2.id
group by t1.id, t2.cnt
having (t2.time-t1.time GT 0) and (t2.time-t1.time=min(
(case
when t2.time-t1.time LT 0 then 99999999
else t2.time-t1.time
end
)
)
)
;
create table step2 as
select coalesce(t1.id, t2.id) as id, coalesce(t1.indate, t2.indate) as indate format=datetime18., coalesce(t1.incnt, t2.incnt) as incnt, t1.outdate format=datetime18., t1.outcnt
from step1 as t1
full join
have(where=(flag="IN") rename=(time=indate cnt=incnt)) as t2
on t1.id=t2.id and t1.indate=t2.indate and t1.incnt=t2.incnt;
quit;
This pre-processes your data to provide the best in/out pairs and you can then use proc transpose with flag and incnt to name your new variables. The 9999999 was simply to assign a large enough date value when the difference is negative that the case would never be the result of the minimum summary function.
The only possible issue that I can see is if the very last in/out pair has a missing value for out. In this case you may need to add an additionnal step to transform back date 9999999 (whatever this resolves to) to a missing value
Vincent
Thanks Vince for your help.
I have no advanced knowledge on SQl so i prefer simple datasteps eventhough sql is easy and finishes in simple steps.
Thanks
I don't exactly like the resulting code compared to the SQL approach but here is how this SQL logic could be duplicated with data step syntax only. DSTEP3 dataset is identical to STEP2 dataset in the sql. From there, you can use the appropriately obtained indate/outdate with incnt to do the proc transpose
/* Step1 do a full cartesian product of multiple into multiple - too bad SAS merge can't do that... got to use hash tables
or other syntaxically difficult methods. Difference between dates were also added with the same approach as the SQL case approach */
data dstep1;
length id 8. indate 8. incnt 8. outdate 8. outcnt 8.;
if _n_=1 then do;
declare hash myhash(dataset: 'have(where=(flag="OUT") rename=(time=outdate cnt=outcnt))', multidata: "YES");
myhash.defineKey('id');
myhash.defineData('outdate', 'outcnt');
myhash.defineDone();
end;
set have(where=(flag="IN") rename=(time=indate cnt=incnt));
if myhash.find() = 0 then do;
if outdate-indate<0 then diff=99999999;
else diff=outdate-indate;
output;
do while(myhash.find_next()=0);
if outdate-indate<0 then diff=99999999;
else diff=outdate-indate;
output;
end;
end;
format indate datetime18. outdate datetime18.;
drop flag;
run;
proc sort data=dstep1;
by id outcnt diff;
run;
/* step2 */
/* Keep only the best time difference for each outcnt */
data dstep2;
set dstep1;
by id outcnt;
if first.outcnt then output;
run;
/* Step3 */
/* Rebuild the missing incnt created by step2 */
data dstep3;
length id 8. indate 8. incnt 8. outdate 8. outcnt 8.;
if _n_=1 then do;
declare hash myhash(dataset: "dstep2");
myhash.defineKey('id', 'incnt');
myhash.defineData('outdate', 'outcnt');
myhash.defineDone();
end;
set have(where=(flag="IN") rename=(time=indate cnt=incnt));
rc=myhash.find();
if outdate=99999999 then outdate=.;
output;
drop flag rc;
run;
Vincent
HI Tom,
if I use the pro transposeI will have a problem
FOr example: across the data if there is only In5 and no out5 but in6 is present then we get
Out4 in5 In6........
AS you can see aboveout5 is missing.
shd have been out4 In5 out5 in6
That's the reason I was tending towards Array method.
ANyways does the Array method after the addition of extra if condition
will it also serves out repurpose???
thanks
That is why you need the extra data step before the PROC TRANSPOSE to assign the right pair number to each record. Then you can tell PROC TRANSPOSE to generate the resulting variable name from the FLAG variable and the new generated pair number.
Hi ,
Sorry for pressing on this question again and again.
I use the following data and code to transpose and get the result.
Unfortunately, IN4 doesnot come up in the final WANT dataset(i understand that there is no IN4 but i want it to show missing value for IN4 as
IN4 OUT4
. 23SEP89:06:00:00
The below code completely eliminates IN4 from the final OUTPUT
Thanks
data have;
input ID $ flag $ time :datetime. cnt;
format time datetime.;
datalines;
101 IN 04Sep1989:7:30 1
101 IN 04Sep1989:13:45 2
101 IN 21SEP1989:17:55 3
101 OUT 05SEP1989:7:15 1
101 OUT 22SEP1989:06:00 2
101 OUT 23SEP1989:06:00 12
101 IN 24SEP1989:06:00 15
102 OUT 02JUL2013:10:15 1
102 IN 08JUL2013:12:20 3
102 OUT 12JUL2013:13:00 2
run;
proc sort;
by id time;
run;
data middle ;
set have ;
by id;
if first.id then group=0;
if first.id or flag='IN' or (flag='OUT' and flag=lag(flag)) then group+1 ;
run;
proc transpose data=middle out=want (drop=_name_);
by id;
id flag group ;
var time;
.
Two ways to enforce that all variables are created.
One is to post process the output of TRANSPOSE. You can find the number of groups from the middle dataset and use that. By placing the ARRAY statements before the SET statement you can insure that variables are order IN1, IN2,... OUT1,OUT2,....
proc sql noprint ; select max(group) into :ngroups separate by ' ' from middle; quit;
data want ;
array in (&maxgroup);
array out (&maxgroup);
set want;
run;
Second is to add a record (or records) to the middle dataset so that all FLAG*GROUP combinations are present.
Here is a general purpose way that adds a set of records with missing ID value and then uses WHERE clause dataset option to delete that generated ID from the final output dataset. I set the ORDER BY clause in the SQL code so that the generated variables from PROC TRANSPOSE will be IN1,OUT1,IN2,OUT2,... . You switch group and flag in the ORDER BY clause if you want all IN variables defined before all OUT variables.
proc sql ;
create table middle2 as
select * from middle
union corresponding
select ' ' as id,. as time,a.flag,b.group
from (select distinct flag from middle) a
, (select distinct group from middle) b
order by id, group, flag
;
quit;
proc transpose data=middle2 out=want2 (drop=_name_ where=(id is not null)) ;
by id;
id flag group ;
var time;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.