Hello all!
I have a brainteaser here…
I have sorted data where I mark an indicator by Z or X, depending on some condition.
From the sample data, I need to count the occurrences of X within the first field. If X occurs more than one time continuously, I need to count it as 1…
Any suggestions?
Thank you in advance!
I am on SAS 9.2 using DI 4.21
sample:
| 000017861 | 0001 | 120 | Z | |
| 000017861 | 0002 | 160 | Z | |
| 000017861 | 0003 | 110 | X | 1 |
| 000017861 | 0004 | 190 | X | 1 |
| 000017861 | 0005 | 110 | X | 1 |
| 000017861 | 0006 | 140 | Z | |
| 000017869 | 0001 | 120 | Z | |
| 000017869 | 0002 | 140 | Z | |
| 000017869 | 0003 | 110 | X | 2 |
| 000017869 | 0004 | 190 | X | 2 |
| 000017869 | 0005 | 110 | X | 2 |
| 000017869 | 0006 | 140 | Z | |
| 000017869 | 0007 | 110 | X | 3 |
| 000017869 | 0008 | 190 | X | 3 |
| 000017869 | 0009 | 110 | X | 3 |
| 000017869 | 0010 | 190 | X | 3 |
| 000017869 | 0011 | 110 | X | 3 |
| 000017869 | 0012 | 140 | Z | |
| 000017869 | 0013 | 110 | X | 4 |
| 000017869 | 0014 | 140 | Z | |
| 000017869 | 0015 | 110 | X | 5 |
| 000017869 | 0016 | 140 | Z | |
| 000017869 | 0017 | 110 | X | 6 |
| 000017869 | 0018 | 140 | Z | |
| 000017869 | 0019 | 180 | Z | |
| 000017869 | 0020 | 200 | X | 7 |
| 000017877 | 0001 | 120 | Z | |
| 000017877 | 0002 | 150 | Z | |
| 000017877 | 0003 | 140 | Z | |
| 000017877 | 0004 | 110 | X | 1 |
| 000017877 | 0005 | 140 | Z | |
| 000017877 | 0006 | 110 | X | 2 |
| 000017877 | 0007 | 140 | Z | |
| 000017877 | 0008 | 110 | X | 3 |
| 000017877 | 0009 | 140 | Z | |
| 000017877 | 0010 | 110 | X | 4 |
| 000017878 | 0001 | 120 | Z | |
| 000017878 | 0002 | 140 | Z | |
| 000017878 | 0003 | 110 | X | 1 |
| 000017878 | 0005 | 110 | X | 1 |
| 000017878 | 0007 | 110 | X | 1 |
| 000017878 | 0008 | 140 | Z | |
| 000017878 | 0010 | 180 | Z |
Please post test data in the form of a datastep, its not fun for me to have to type that in!
data have;
infile datalines dlm=" " missover;
input id $ num_id $ val type $;
datalines;
000017861 0001 120 Z
000017861 0002 160 Z
000017861 0003 110 X
000017861 0004 190 X
000017861 0005 110 X
000017861 0006 140 Z
000017869 0001 120 Z
000017869 0002 140 Z
000017869 0003 110 X
000017869 0004 190 X
000017869 0005 110 X
000017869 0006 140 Z
000017869 0007 110 X
000017869 0008 190 X
000017869 0009 110 X
000017869 0010 190 X
000017869 0011 110 X
;
run;
data want;
set have;
by id type notsorted;
retain total curr;
if first.id then total=0;
if first.type and type="X" then do;
total=total+1;
curr=total;
end;
if first.type and type ne "X" then curr=.;
run;
Basically keep a running total within id, and set a retained value based on type on or off.
Why is '1' in
000017877 0004 110 X 1
I can't figure the rules when to restart counting.
For the first part, up to restarting the count
assuming variables are v1 v2 v3 vX counter you can
data test;
set have;
retain phase 0 counter;
if vx = 'X' and phase=0 then do;
phase=1;
counter +1;
end;
if vx ne 'X' then phase = 0;
run;
can do:
Thank you for you quick response. The 1 is not part of the data. I put it there to show the result that I need.
Please post test data in the form of a datastep, its not fun for me to have to type that in!
data have;
infile datalines dlm=" " missover;
input id $ num_id $ val type $;
datalines;
000017861 0001 120 Z
000017861 0002 160 Z
000017861 0003 110 X
000017861 0004 190 X
000017861 0005 110 X
000017861 0006 140 Z
000017869 0001 120 Z
000017869 0002 140 Z
000017869 0003 110 X
000017869 0004 190 X
000017869 0005 110 X
000017869 0006 140 Z
000017869 0007 110 X
000017869 0008 190 X
000017869 0009 110 X
000017869 0010 190 X
000017869 0011 110 X
;
run;
data want;
set have;
by id type notsorted;
retain total curr;
if first.id then total=0;
if first.type and type="X" then do;
total=total+1;
curr=total;
end;
if first.type and type ne "X" then curr=.;
run;
Basically keep a running total within id, and set a retained value based on type on or off.
Thank you for your quick response! Sorry about that! I attached a sample file and tried to copy data in the original message...
I will try the solution!
Runs like a champion!
Thank you!
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Still thinking about your presentation idea? The submission deadline has been extended to Friday, Nov. 14, at 11:59 p.m. ET.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.