BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
robertrao
Quartz | Level 8

Hi,

Could someone help me solve this transpose step??

INS should go with the OUTS

If there are more of a kind(in this case IN'S) then closest one should become the pair.in our case

04Sep1989:13:45 has to pair with  05SEP1989:7:15 instead of the 04Sep1989:7:30

and 04Sep1989:7:30 should have a blank as a pair

HAVE

ID         flag           time                        cnt

101      IN           04Sep1989:7:30        1

101      IN           04Sep1989:13:45       2

101     IN           21SEP1989:17:55       3

101   OUT        05SEP1989:7:15          1

101   OUT        22SEP1989:06:00        2

WANT

ID             IN1                        OUT1            IN2                               OUT2                              IN3                                  OUT3       

101   04Sep1989:7:30                         04Sep1989:13:45      05SEP1989:7:15         21SEP1989:17:55       22SEP1989:06:00

Thanks

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

Change the 4's to something larger. It is hard to know how many elements to use for the array.  To be safe you could make it as large as he maximum number of records per ID.

Or you could just calculate the grouping variable and then let PROC TRANSPOSE make the output dataset for you.

proc sort;

by id time;

run;

data middle ;

  set have ;

  by id;

  if first.id then group=0;

  if first.id or flag='IN' or (flag='OUT' and flag=lag(flag)) then group+1 ;

run;

proc transpose data=middle out=want (drop=_name_);

  by id;

  id flag group ;

  var time;

run;

View solution in original post

51 REPLIES 51
LinusH
Tourmaline | Level 20

You should resort your data by ID and time.

Then use the data step and last. logic to do explicit OUTPUT.

Before the the output just use IF-THEN logic to do variable assignments.

Data never sleeps
Tom
Super User Tom
Super User

You could do it with a DATA step.  You will need an upper bound on the number of variables which you could calculate as the maximum value of CNT variable.

%let max=3 ;

data want ;

  do until (last.id);

     set have ;

     by id ;

     array in (&max);

     array out (&max);

     if flag='IN' then in(cnt) = time;

     if flag='OUT' then out(cnt) = time;

  end;

  keep id in: out: ;

  format in: out: datetime.;

run;

robertrao
Quartz | Level 8

Hi Tom,

Thanks for the reply.

The method you suggest is not giving me the following output

WANT

ID             IN1                        OUT1            IN2                               OUT2                              IN3                                  OUT3       

101   04Sep1989:7:30                         04Sep1989:13:45      05SEP1989:7:15         21SEP1989:17:55       22SEP1989:06:00

OUT1 should be missing

basiclly when there are more of a kind we should see which one of those makes a closest pair to the other

like above 04Sep1989:13:45 is closest to 05SEP1989:7:15 than 04Sep1989:7:30 and so OUT1 is left blank

Thanks

Tom
Super User Tom
Super User

So the CNT variable is useless.  Instead you want to create new IN/OUT pairs based on the relative TIME values?

proc sort;

by id time;

run;

data want ;

  do until (last.id);

    set have ;

    by id;

    array in (4);

    array out (4) ;

    if first.id then i=1;

    else if flag='IN' then i=i+1;

    if flag='IN' then in(i)=time;

    if flag='OUT' then out(i)=time;

  end;

  keep id in: out: ;

  format in: out: datetime.;

run;

Tom
Super User Tom
Super User

The above will collapse sequence like IN/OUT/OUT into a single pair and eliminate the middle OUT time point.

To fix that you might change the index variable increment logic to:

    if first.id then i=1;

    if flag='IN' then do; if in(i) ne . then i=i+1; in(i)=time; end;

    if flag='OUT' then do; if out(i) ne . then i=i+1; out(i)=time; end;

robertrao
Quartz | Level 8

Hi Tom,

Thanks a ton for your time.

But i wonder if this works for all cases

When i run the code for the below data 24SEP1989:06:00 IN forms a pair with 23SEP1989:06:00 which cannot be the case because IN cannot be later (24) compared to OUT(23rd)

data have;
input ID $   flag $  time  :datetime. cnt;
format time datetime.;
datalines;
101    IN        04Sep1989:7:30        1
101    IN        04Sep1989:13:45       2
101    IN        21SEP1989:17:55       3
101   OUT        05SEP1989:7:15        1
101   OUT        22SEP1989:06:00       2
101   OUT        23SEP1989:06:00      12
101    IN        24SEP1989:06:00      15

run;

I was expecting OUT4 to be by itself without the corresponding IN

and 24SEP1989:06:00 is IN5 without the corresponding OUT

Hope this helps

Thanks

Tom
Super User Tom
Super User

You need to keep the auto increment when flag='IN'.

    if first.id then i=0;

    if flag='IN' then do; i=i+1; in(i)=time; end;

    if flag='OUT' then do; if out(i)^=. then i=i+1; out(i)=time; end;

robertrao
Quartz | Level 8

I am using the following code and it says :

ARRAY SUBSCRIPT OUT OF RANGE.Please correct me

Thanks

data want ;

  do until (last.id);

    set have ;

    by id;

    array in (4);

    array out (4) ;

    if first.id then i=0;

    if flag='IN' then do; i=i+1; in(i)=time; end;

    if flag='OUT' then do; if out(i)^=. then i=i+1; out(i)=time; end;

end;

  keep id in: out: ;

  format in: out: datetime.;

run;

ERROR: Array subscript out of range at line 907 column 34.


last.id=1 ID=101 flag=IN time=24SEP89:06:00:00 cnt=15 FIRST.ID=0 in1=04SEP89:07:30:00 in2=04SEP89:13:45:00 in3=21SEP89:17:55:00 in4=. out1=. out2=05SEP89:07:15:00


out3=22SEP89:06:00:00 out4=23SEP89:06:00:00 i=5 _ERROR_=1 _N_=1

Tom
Super User Tom
Super User

Change the 4's to something larger. It is hard to know how many elements to use for the array.  To be safe you could make it as large as he maximum number of records per ID.

Or you could just calculate the grouping variable and then let PROC TRANSPOSE make the output dataset for you.

proc sort;

by id time;

run;

data middle ;

  set have ;

  by id;

  if first.id then group=0;

  if first.id or flag='IN' or (flag='OUT' and flag=lag(flag)) then group+1 ;

run;

proc transpose data=middle out=want (drop=_name_);

  by id;

  id flag group ;

  var time;

run;

robertrao
Quartz | Level 8

Hi Tom,

Both of the Array method(after changing length to a larger value ) and transpose method work very well. Thanks so much

In the array method:

why were you using if out(i)^=.??? and not for the in(i)??????what if in an other case of our data the OUTS outnumber the INS???

Please help me understand the code as well

if flag='IN' then do; i=i+1; in(i)=time; end;

    if flag='OUT' then do; if out(i)^=. then i=i+1; out(i)=time; end;

Could you also put this from the transpose method  in words for me??

if first.id then group=0;

if first.id or flag='IN' or (flag='OUT' and flag=lag(flag)) then group+1 ;

we have already said if first.id then group=0; Again why do we use if first.id or or flag='IN' or (flag='OUT' and flag=lag(flag))

why isnt the lag function used for the IN???

Thanks so much

Tom
Super User Tom
Super User

To prevent IN > OUT you always want to start a new pair (group) when see an IN record.  But when you see and OUT record it could be the OUT for the current pair or an unmatched OUT that needs a new pair.  One way to tell this is if the current pair already has an OUT time value. So we test if OUT(I) has a value when processing an OUT record, but we don't care for IN record as we know that it marks a new pair.

So a new group starts when

  • the first record for an ID group (FIRST.ID)
  • any IN record (FLAG='IN')
  • an OUT record when it is not preceded by an IN record.  (FLAG='OUT' and LAG(FLAG)='OUT')

To avoid incrementing the group counter twice when more than one of the conditions apply to the same record the program first sets it to 0 at the start of a ID group.  Then it can use the same logic test about whether to increment the group counter for every record it processes.

robertrao
Quartz | Level 8

Thanks for the dettailed and quick response.

I could not mark it right because the options dont pop up ....

I will do it as soon as i see it

Lastly, i will post an other question shortly on the same quesion but with Imputing the missing values

if IN is missing for a particular OUT then the IN should have a value of 1 minute prior instead of the current code which sets to missing

Also if OUT is missing for a particular IN then OUT should have a value of 1 minute past the IN

Thanks again

Tom
Super User Tom
Super User

Apply the imputation to the resulting pairs.

do i=1 to dim(in);

if IN(i)=. then IN(i) = out(i) - 1;

if out(i) = . then out(i) = in(i) + 1;

end;

robertrao
Quartz | Level 8

Hi Tom,

I was still wondering about setting the group to 0 at the beginning of each new ID

you explained that if we dont do it then if two conditions are satisfying a record then there is a chance of "incrementing the group counter twice!!!!!

Suppose I have the following as the first record

101   IN     Sep6:7:30

It is the first of the ID and flag is IN(satisfying 2 conditions) ....how would the group counter gets incremented twice if we dont initialize the group to 0???

Thanks

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 51 replies
  • 2011 views
  • 10 likes
  • 4 in conversation