Dear community,
In my data I want to use a counter variable that counts under special conditions. The counter should start to count everytime primary is equal to 4 until it stops being 4. In the case primary = 5 prior to primary = 4 the counter shouldn't start. The same goes (backwards looking) for a primary=5 which appears after a sequence of primary=4 (the counter should be set to zero for the whole sequence before). Think of it as the employment status where the counter counts the time in unemployment (primary=4). While this duration is just interesting if the individual was in the labor force before and stays in after, I don't want the duration to be count if primary = 5 (appears when the individual is not in the labor force). Here is an example(normally there are 15000 individuals and time is up to 1500):
data work.sample;
input id $ primary $ time ($);
datalines;
1 5 1
1 5 2
1 4 3
1 4 4
1 4 5
1 1 6
2 1 1
2 1 2
2 4 3
2 4 4
2 4 5
2 5 6
3 1 1
3 4 2
3 4 3
3 4 4
3 1 5
3 1 6;
I am able to set up the counter but so far I have no clue how to bring in my conditions on primary = 5 and the following and earlier periods.
data test1;
set test1;
if primary = 4 and lag(primary) ^= 4 then duration = 1;
if primary = 4 and lag(primary) = 4 then duration+1;
if primary ^= 4 then duration = 0;
run;
The output in the end should look like this:
input id $ primary $ time $ duration ($)
1 5 1 0
1 5 2 0
1 4 3 0 <- no count after because of the primary=5
1 4 4 0
1 4 5 0
1 1 6 0
2 1 1 0
2 1 2 0
2 4 3 0
2 4 4 0
2 4 5 0
2 5 6 0 <- no count before because of the primary=5
3 1 1 0
3 4 2 1
3 4 3 2
3 4 4 3
3 1 5 0
3 1 6 0
I would be really thankful for any help!
Though I feel this code can be further simplified, but it is my sleep time and my brain stops working 5 minutes ago,
data work.sample;
input id $ primary $ time $;
datalines;
1 5 1
1 5 2
1 4 3
1 4 4
1 4 5
1 1 6
2 1 1
2 1 2
2 4 3
2 4 4
2 4 5
2 5 6
3 1 1
3 4 2
3 4 3
3 4 4
3 1 5
3 1 6
;
data want;
declare hash h(multidata:'y');
h.definekey('id');
h.definedata('id', 'primary','time', 'duration');
h.definedone();
declare hiter hi('h');
do until (last.id);
set sample;
retain flag;
_lg=lag(primary);
if primary='4' and _lg = '5' then flag=1;
if primary ne '4' and _lg = '4' then flag=0;
if primary='4' and flag ne 1 then
do;
duration+1;
rc=h.add();
end;
else if primary='5' and _lg='4' then
do;
rc=h.add();
do rc=hi.first() by 0 while (rc=0);
duration=0;
output;
rc=hi.next();
end;
rc=h.clear();
end;
else if primary ne '5' and _lg=4 then
do;
duration=0;
rc=h.add();
do rc=hi.first() by 0 while (rc=0);
output;
rc=hi.next();
end;
rc=h.clear();
end;
else
do;
duration=0;
output;
end;
end;
drop rc flag _lg;
run;
Though I feel this code can be further simplified, but it is my sleep time and my brain stops working 5 minutes ago,
data work.sample;
input id $ primary $ time $;
datalines;
1 5 1
1 5 2
1 4 3
1 4 4
1 4 5
1 1 6
2 1 1
2 1 2
2 4 3
2 4 4
2 4 5
2 5 6
3 1 1
3 4 2
3 4 3
3 4 4
3 1 5
3 1 6
;
data want;
declare hash h(multidata:'y');
h.definekey('id');
h.definedata('id', 'primary','time', 'duration');
h.definedone();
declare hiter hi('h');
do until (last.id);
set sample;
retain flag;
_lg=lag(primary);
if primary='4' and _lg = '5' then flag=1;
if primary ne '4' and _lg = '4' then flag=0;
if primary='4' and flag ne 1 then
do;
duration+1;
rc=h.add();
end;
else if primary='5' and _lg='4' then
do;
rc=h.add();
do rc=hi.first() by 0 while (rc=0);
duration=0;
output;
rc=hi.next();
end;
rc=h.clear();
end;
else if primary ne '5' and _lg=4 then
do;
duration=0;
rc=h.add();
do rc=hi.first() by 0 while (rc=0);
output;
rc=hi.next();
end;
rc=h.clear();
end;
else
do;
duration=0;
output;
end;
end;
drop rc flag _lg;
run;
Dear Hai.kuo,
thanks for the quick help! That code worked perfectly
data work.sample;
input id $ primary $ time $;
datalines;
1 5 1
1 5 2
1 4 3
1 4 4
1 4 5
1 1 6
2 1 1
2 1 2
2 4 3
2 4 4
2 4 5
2 5 6
3 1 1
3 4 2
3 4 3
3 4 4
3 1 5
3 1 6
;
data want;
call missing(k);
count=0;
do until(last.id);
set sample;
by id;
j=lag(primary);
if primary=4 then
count+1;
if count=1 then
k=j;
if primary ne 4 and lag(primary)=4 then
l=primary;
end;
do until(last.id);
set sample;
by id;
if primary=4 and k ne 5 and l ne 5 then
duration+1;
else duration=0;
output;
end;
keep id primary time duration;
run;
Regards,
Naveen Srinivasan
L&T Infotech
Better formatted code:
data work.sample;
input id $ primary $ time $;
datalines;
1 5 1
1 5 2
1 4 3
1 4 4
1 4 5
1 1 6
2 1 1
2 1 2
2 4 3
2 4 4
2 4 5
2 5 6
3 1 1
3 4 2
3 4 3
3 4 4
3 1 5
3 1 6
;
data want;
call missing(k);
count=0;
do until(last.id);
set sample;
by id;
j=lag(primary);
if primary=4 then
do;
count+1;
if count=1 then
k=j;
end;
if primary ne 4 and lag(primary)=4 then
l=primary;
end;
do until(last.id);
set sample;
by id;
if primary=4 and k ne 5 and l ne 5 then
duration+1;
else duration=0;
output;
end;
keep id primary time duration;
run;
Regards,
Naveen Srinivasan
L&T Infotech
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.