select variables from different column

Reply
Contributor
Posts: 25

select variables from different column

[ Edited ]
data have;
infile cards expandtabs;
input group id$ date$ time$ class;
cards;
0 A 20/7/2014 10:00:00 1
0 L 20/7/2014 10:30:12 2
1 R 20/7/2014 11:12:05 3
1 A 20/7/2014 11:15:07 1
1 L 20/7/2014 11:50:30 2
2 F 20/7/2014 12:02:06 1
2 L 20/7/2014 12:04:50 1
3 A 20/7/2014 13:41:19 1
3 B 20/7/2014 13:41:19 3
3 C 20/7/2014 13:41:19 2
4 B 20/7/2014 14:02:05 3
;
run;


Hi, i would like to select the ID  that appears in two groups consequently with respect to the class as well. 

 

look at ID if groups>= 2  and the class = class then

calculate the time duration. 

 

The expected output. 

id date duration  class
A20/7/20141:15:071
L20/7/20141:20:182
B20/7/20140:20:463

 

 Thank you. 

Super User
Super User
Posts: 9,227

Re: select variables from different column

Not sure I really follow you, just sort the data:

proc sort data=have out=want;
  by id group;
run;
data want;
  set want;
  if group > 0 and id=lag(id) then do;
    duration=time-lag(time);
    output;
  end;
run;

Something like that, can't test as you have not provided test data in the form of a datastep!

Contributor
Posts: 25

Re: select variables from different column

Iam so sorry for that. I have modified the data
Super User
Posts: 9,611

Re: select variables from different column

[ Edited ]

ihlayyel wrote:
Iam so sorry for that. I have modified the data

Do so again. You should know by now what a data step is and how data step code looks. The intention is that with a simple copy/paste and submit of your code the dataset is created. No ERRORs, no WARNINGs.

 

PS in response to this:

group id date time class
0 A 20/7/2014 10:00:00 1
0 L 20/7/2014 10:30:12 2
1 R 20/7/2014 11:12:05 3
1 A 20/7/2014 11:15:07 1
1 L 20/7/2014 11:50:30 2
2 F 20/7/2014 12:02:06 1
2 L 20/7/2014 12:04:50 1
3 A 20/7/2014 13:41:19 1
3 B 20/7/2014 13:41:19 3
3 C 20/7/2014 13:41:19 2
4 B 20/7/2014 14:02:05 3

which is not SAS code at all.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
How to post code
Contributor
Posts: 25

Re: select variables from different column

Posted in reply to KurtBremser
Thank you for your reply. I am working on it Smiley Happy
Contributor
Posts: 25

Re: select variables from different column

Thank you for your answer. but the calculation missing one point if there is one lag between two groups.
Trusted Advisor
Posts: 1,289

Re: select variables from different column

When ID='A' has more than 2 observations, do you want duration to be the time difference between the current  A and the most recent prior A?  If so, then you want what I would term a "dynamic lag" - ("dynamic" because the two A records are separated by a non-constant number of intermediate records).

 

Assuming your dataset is sorted by TIME  (it doesn't matter whether it is sorted  by ID):

 

data want (drop=_:);
  set have;
  format _last_time duration time8.0;
  if _n_=1 then do;
    declare hash h();
      h.definekey('id');
      h.definedata('_last_time');
      h.definedone();
  end;
  _rc=h.find();
  if _rc=0 then duration= time-_last_time ;
  _rc=h.replace(key:id,data:time);
run;

 

 

Now, if DATE  matters also (i.e. DURATION can cross date boundaries), then you'll have to add a variable called _LAST_DATE to the format statement, the DEFINEDATA method, and to the REPLACE method.   You'll also have to calculate

     DURATION=86400*(date-_last_date)  + time - _last_time;

 

And you'll also probably want to change the format of DURATION to, say TIME9.0 (accommodates up to 999 hours, 59 mins, 59 secs) or HHMM6.0  (up to  999 hours, 59 mins). 

 

 

This program depends  on creating  a hash object H, which  allows values to be stored keyed on ID, retrieved, and persists across observations.

Contributor
Posts: 25

Re: select variables from different column

[ Edited ]

thank you so much. That is awesome. but there is one thing. 

the code keeps considering the last time value. if the ID appears in a different group it will measure the time of new ID with the last time.

 

the output of your code. 

group id   date   time     duraion        


1 A 20/07/2014 15:07.0 1 1:15:07 1 L 20/07/2014 50:30.0 2 1:20:18 2 L 20/07/2014 04:50.0 1 0:14:20 3 A 20/07/2014 41:19.0 1 2:26:12 4 B 20/07/2014 02:05.0 3 0:20:46

 

How to say that if another ID is found start match again. and the number of the count of groups is >0

 

 

 

 

 

Ask a Question
Discussion stats
  • 7 replies
  • 144 views
  • 0 likes
  • 4 in conversation