Solved: Proc Transpose - Page 3

Tom · Posted 08-27-2013 06:15 PM

You should probably use the version that only increments the counter in one place and uses PROC TRANSPOSE to create the variables. https://communities.sas.com/message/178337#178337 That version will also allow for unlimited number of pairs per id.

If you want to fix the issue of starting with I=0 and seeing OUT record as the first entry then you need to have a test to make sure I is not zero before using it.

if flag='OUT' then do;

if i=0 then i=i+1;
else if out(i)^=. then i=i+1;
out(i)=recorded_time;
end;

robertrao · Posted 08-28-2013 11:06 AM

Hi ,

At last i was able to replicate the results with both the methods.

Thanks a lot for the help all of you rendered in this regard

Also one final question before we close of on this:

I tried the code without using the else ie

if out(i)^=. then i=i+1 INSTEAD OF

else if out(i)^=. then i=i+1; in the below portion of the code

Do you think using else or not using will do any difference?????????

if flag='OUT' then do;

if i=0 then i=i+1;
else if out(i)^=. then i=i+1;
out(i)=recorded_time;
end;

Thanks

Tom · Posted 08-28-2013 11:46 AM

Try it both ways and see if it makes any difference. If it does see if you can figure out why.

robertrao · Posted 08-28-2013 12:08 PM

Hi Tom,

With the current data I have , both the ways dint make any difference to my output

I am not able to figure out the reason??

Could you help me

Thanks

Tom · Posted 08-28-2013 01:47 PM

So for first record of a patient has FLAG='OUT'.

First it will set I=1 because I=0.

Without the ELSE it will then test if OUT(I) is NOT missing, but since it is the first record this can never be true.

robertrao · Posted 08-28-2013 01:57 PM

hmm,

I tried with the transpose method and

Array method without using the else in the if condition

if out(i)^=. then i=i+1;

FOR

else if out(i)^=. then i=i+1;

Both the transpose and

if out(i)^=. then i=i+1;

methods gave me similar results!!!!!!!!

I checked record by record and they are very same!!!!!

Tom · Posted 08-28-2013 03:44 PM

As they should be. The only difference is that without the else you are testing for a condition that can never be true.

robertrao · Posted 08-28-2013 05:07 PM

Having said that......

Then if we put else then the condition can become true sometimes and

i=i+1 can occur sometimes by using else....gets incremented

The same cannot happen without using else(as u said this can never be true).......never gets incremented

According to these definitions we have to get different results for both the processes

Thanks

Tom · Posted 08-28-2013 06:04 PM

To test something like this you need to imagine all possible pathways through the code. Then you can generate test data to see if your analysis is right.

The analysis immediate above talks about when the FIRST record is an OUT record. In that case the ELSE does not matter as the second increment cannot happen in either case.

What about when you have two OUT records in a row? Will the ELSE make a difference? Could it cause a required increment to be skipped? Could it cause I to be incremented twice?

What about when you have an OUT following an IN?

Are they any other combinations you can think of?

Vince28_Statcan · Posted 08-29-2013 08:12 AM

As Tom has mentionned, you have to think of the possible paths through your condition logic. I believe you are missunderstanding either because you don't backtrack far enough to realize when/how can i=0 or because you are unaware of how the data vector (and thus the in: out: variables) behaves behind the scenes at each iteration of the data step.

if i=0 then i=i+1;

if out(i)^=. then i=i+1;

In order for the first if condition to be true and thus for i to be incremented, you have to have i=0. Now if you look at your code, when exactly can i be equal to 0? Well, you only set i=0 once per data step iteration and it's when first.id. That is, i=0 only when you encounter the very first row of a by group. On top of that, the way your do until construct is made, the data step iterator loops each time you encounter a new by group.

So then you have to know/be aware that at each iteration of the data step, all variables that are not in a retain statement are set to missing and then a new vector is read. Since out: and in: are variables built throughout your data step and not read from a dataset, it means that each time your data step loops, in: and out: are all set to missing. Thus, each time you enter a new by group and set the value i=0, in: and out: are all set to missing.

So the resulting conclusion is that the conditions i=0 and out(i+1)^=. are mutually exclusive. They can never both be true at the same time. So adding the else has no impact on the final result. It only improves processing slightly because then each time i=0, the other condition is not verified.

I gave it some time as I felt, like Tom, that it would be better for you to crush it by yourself. However, reading subsequent comments, I was affraid that the main blocking point was not knowing how the data step iterator behaved which really isn't entirely natural and not exactly easy to derive from an example alone.

I Hope this helps and that I did not ruin Tom's pedagogical plans on this.

Vincent

robertrao · Posted 08-25-2013 10:25 AM

Hi Tom,

Could you explain the logic in the following method one more time.

I tried to use putlog to see what is happening in the code and it only allows me to put the putlog between the two IF conditions!!!!

if I put it after the two if statements and it gives the result of only the 2nd if condition!!!

basically what I understood is that the OUT also gets the same i value as the in unless its non missing!!!

could you tell me how to interpret the code and how in the end we end up with one record fter running this code???

Thanks

data want ;

do until (last.id);

set have ;

by id;

array in (7);

array out (7) ;

if first.id then i=0;

if flag='IN' then do;

i=i+1;

in(i)=time;

end;

if flag='OUT' then do;

if out(i)^=. then i=i+1;

out(i)=time; end;

end;

keep id in: out: ;

format in: out: datetime.;

run;

Tom · Posted 08-25-2013 02:03 PM

data have;

input ID $ flag $ time :datetime. cnt;

format time datetime.;

datalines;

101 IN 04Sep1989:07:30 1

101 IN 04Sep1989:13:45 2

101 IN 21SEP1989:17:55 3

101 OUT 05SEP1989:07:15 1

101 OUT 22SEP1989:06:00 2

101 OUT 23SEP1989:06:00 12

101 IN 24SEP1989:06:00 15

run;

proc sort data=have ;

by id time;

run;

data want ;

do until (last.id);

set have ;

by id;

putlog (id i first.id flag time) (=);

array in (7);

array out (7) ;

if first.id then i=0;

putlog (id i ) (=) '<- After if first.id';

if flag='IN' then do;

i=i+1;

in(i)=time;

end;

putlog (id i ) (=) '<- After if IN ';

if flag='OUT' then do;

if out(i)^=. then i=i+1;

out(i)=time;

end;

putlog (id i ) (=) '<- After if OUT ' /;

end;

keep id in: out: ;

format in: out: datetime.;

put (_all_) (=/) ;

run;

ID=101 i=. FIRST.ID=1 flag=IN time=04SEP89:07:30:00

ID=101 i=0 <- After if first.id

ID=101 i=1 <- After if IN

ID=101 i=1 <- After if OUT

ID=101 i=1 FIRST.ID=0 flag=IN time=04SEP89:13:45:00

ID=101 i=1 <- After if first.id

ID=101 i=2 <- After if IN

ID=101 i=2 <- After if OUT

ID=101 i=2 FIRST.ID=0 flag=OUT time=05SEP89:07:15:00

ID=101 i=2 <- After if first.id

ID=101 i=2 <- After if IN

ID=101 i=2 <- After if OUT

ID=101 i=2 FIRST.ID=0 flag=IN time=21SEP89:17:55:00

ID=101 i=2 <- After if first.id

ID=101 i=3 <- After if IN

ID=101 i=3 <- After if OUT

ID=101 i=3 FIRST.ID=0 flag=OUT time=22SEP89:06:00:00

ID=101 i=3 <- After if first.id

ID=101 i=3 <- After if IN

ID=101 i=3 <- After if OUT

ID=101 i=3 FIRST.ID=0 flag=OUT time=23SEP89:06:00:00

ID=101 i=3 <- After if first.id

ID=101 i=3 <- After if IN

ID=101 i=4 <- After if OUT

ID=101 i=4 FIRST.ID=0 flag=IN time=24SEP89:06:00:00

ID=101 i=4 <- After if first.id

ID=101 i=5 <- After if IN

ID=101 i=5 <- After if OUT

ID=101

flag=IN

time=24SEP89:06:00:00

cnt=15

i=5

in1=04SEP89:07:30:00

in2=04SEP89:13:45:00

in3=21SEP89:17:55:00

in4=.

in5=24SEP89:06:00:00

in6=.

in7=.

out1=.

out2=05SEP89:07:15:00

out3=22SEP89:06:00:00

out4=23SEP89:06:00:00

out5=.

out6=.

out7=.

robertrao · Posted 08-26-2013 01:01 AM

I very much appreciate your effort in presenting it in a very meaningful format

Thanks again

robertrao · Posted 08-26-2013 09:29 AM

Hi,

One last question..........

I still lack some understanding of how SAS is able to present the result in a single row?

I guess it is because of the do until(last.id)

But how is it able to retain all the previous values to the last record???

Thanks so much

Vince28_Statcan · Posted 08-26-2013 10:05 AM

It is because the Do Until. Do; End; groups allow you to manually control iterations within your datastep. Doing so prevents the natural data step iterator from setting all values in the Data Vector to missing prior to reading the next record. As such, his array (7) does not get erased by the data step iterations due to the manually controlled iteration over each BY-group. Since he used variable-naming for the arrays, they generate in1-in7 and out1-out7 variables.

I believe, if you remove the keep statement and add an output; statement before the end; of the do until; control, you can see the logic of building the arrays over time. It should help you understand his logic, in my opinion, better than via the putlog statements.

This is actually a very clever use of by processing. Had I thought of using the sort by time/id, I would probably still have considered only current & next record leading to a further step with a proc transpose. I'm glad I followed this thread. Thanks Tom

Vince

Re: Proc Transpose

Re: Proc Transpose

Re: Proc Transpose

Re: Proc Transpose

Re: Proc Transpose

Re: Proc Transpose

Re: Proc Transpose

Re: Proc Transpose

Re: Proc Transpose

Re: Proc Transpose

Re: Proc Transpose

Re: Proc Transpose

Re: Proc Transpose

Re: Proc Transpose

Re: Proc Transpose

Registration is open

SAS Training: Just a Click Away