Hello,
I need to count different values in a row for a longitudinal database. For example, I have
data First;
input ID type$ rev1 rev2 rev3 rev4;
cards;
101 a 0 0 . 1
101 b 1 1 0 0
101 c 0 . 1 0
102 b 1 0 1 0
102 c 1 0 . 1
102 d . 1 1 0
;
run;
I need the result 1:
ID Type count_0 count_missing count_1
101 a 2 1 1
101 b 2 0 2
101 c 2 1 1
102 b 2 0 2
102 c 1 1 2
102 d 1 1 2
Result 2:
ID Total_0 Total_missing Total_1
101 6 2 4
102 4 2 6
Thank you so much for any help!!
I will be very appreciate for any help!
Hello Yurie,
One way is to use the SQL procedure:
data First;
input ID type$ rev1 rev2 rev3 rev4;
cards;
101 a 0 0 . 1
101 b 1 1 0 0
101 c 0 . 1 0
102 b 1 0 1 0
102 c 1 0 . 1
102 d . 1 1 0
;
proc sql;
create table First_counts as
select
id
,type
,(case when rev1 = 0 then 1 else 0 end) +
(case when rev2 = 0 then 1 else 0 end) +
(case when rev3 = 0 then 1 else 0 end) +
(case when rev4 = 0 then 1 else 0 end) as count_0
,(case when missing(rev1) then 1 else 0 end) +
(case when missing(rev2) then 1 else 0 end) +
(case when missing(rev3) then 1 else 0 end) +
(case when missing(rev4) then 1 else 0 end) as count_miss
,(case when rev1 = 1 then 1 else 0 end) +
(case when rev2 = 1 then 1 else 0 end) +
(case when rev3 = 1 then 1 else 0 end) +
(case when rev4 = 1 then 1 else 0 end) as count_1
from First;
select * from First_counts;
select
id
,sum(count_0) as total_0
,sum(count_miss) as total_miss
,sum(count_1) as total_1
from First_counts
group by id
order by id;
quit;
For result 1: if the number of rev* variables increase you could write a macro to generate the lines that compute count_0, count_miss, count_1.
Otherwise you could use ARRAYs in a DATA step and compute the counts "manually" by iterating on the different rev* variables. I leave this as an exercise.
HTH
Daniel
Hello Yurie,
Use arrays for your goal.
Below a start. Things may be written shorter, but this is easy to read.
data want (drop=_:);
set first;
count_0 = 0;
count_missing = 0;
count_1 = 0;
array a_rev rev:;
do _i = 1 to dim(a_rev);
if a_rev(_i) = 0 then count_0 = count_0 + 1;
etc;
end;
run;
Hope this helps,
Eric
Well, your quite correct about using arrays. However as the poster is counting certain characters, the easiest way would be to create a string of all elements and then countc() to count each one:
data want (drop=tmp rev1-rev4);
set first;
array rev{4};
tmp=cats(of rev{*});
count_miss=countc(tmp,".");
count_0=countc(tmp,"0");
count_1=countc(tmp,"1");
run;
A transpose and report procedure approach:
data First;
input ID type$ rev1 rev2 rev3 rev4;
cards;
101 a 0 0 . 1
101 b 1 1 0 0
101 c 0 . 1 0
102 b 1 0 1 0
102 c 1 0 . 1
102 d . 1 1 0
;
run;
proc transpose data=first out=transfirst;
by id type;
run;
proc tabulate data=transfirst;
class id type;
class col1 / missing;
table id* type, col1='Value'*n=''*f=best4.;
table id, col1='Value'*n=''*f=best4.;
run;
Hello, Ballardw, RW9, Eric, and Mastropi
I am so grateful for your answers. All answers are great and provide me a big help! It's really amazing!
Best & blessings,
Yurie
Hi. If the goal is two data sets (assumes the data set FIRST is sorted by ID ...
data result1 (keep=id type count:) result2 (keep=id total:);
call missing(total_1, total_0, total_miss);
do until (last.id);
set first;
by id;
count_1 = sum(of rev:);
count_miss = nmiss(of rev:);
count_0 = n(of rev:) - count_1;
total_1 + count_1;
total_0 + count_0;
total_miss + count_miss;
output result1;
end;
output result2;
run;
RESULT1 ...
Obs id type count_0 count_miss count_1
1 101 a 2 1 1
2 101 b 2 0 2
3 101 c 2 1 1
4 102 b 2 0 2
5 102 c 1 1 2
6 102 d 1 1 2
RESULT2 ...
Obs id total_0 total_miss total_1
1 101 6 2 4
2 102 4 2 6
data _V/view=_V;
set FIRST;
COUNT_1 = sum(of REV:);
COUNT_MISS = nmiss(of REV:);
COUNT_0 = n(of REV:) - COUNT_1;
drop REV:;
run;
proc means data=_V noprint;
class ID TYPE ;
var COUNT:;
types ID ID*TYPE;
output out=SUM sum=;
run;
proc print data=SUM noobs;
where _TYPE_=2;
var ID COUNT:;
proc print data=SUM noobs;
where _TYPE_=3;
var ID TYPE COUNT:;
run;
ID | COUNT_1 | COUNT_MISS | COUNT_0 |
101 | 4 | 2 | 6 |
102 | 6 | 2 | 4 |
ID | type | COUNT_1 | COUNT_MISS | COUNT_0 |
101 | a | 1 | 1 | 2 |
101 | b | 2 | 0 | 2 |
101 | c | 1 | 1 | 2 |
102 | b | 2 | 0 | 2 |
102 | c | 2 | 1 | 1 |
102 | d | 2 | 1 | 1 |
Hi. Thirteen statements versus fifteen in just a data step ... a virtual tie (with desired variable names from the data step).
ps Could have simplified the CALL MISSING statement by placing it after the appearance of the various TOTAL variables ...
data result1 (keep=id type count:) result2 (keep=id total:);
do until (last.id);
set first;
by id;
count_1 = sum(of rev:);
count_miss = nmiss(of rev:);
count_0 = n(of rev:) - count_1;
total_1 + count_1;
total_0 + count_0;
total_miss + count_miss;
output result1;
end;
output result2;
call missing(of total:);
run;
What do you mean a tie? Two fewer! And I can rename the variables too, and put the drop statement as an option!
Seriously, this was more about separating the data preparation from the calculations.
No better or worse, just different.
data First;
input ID type$ rev1 rev2 rev3 rev4;
dat = _infile_;
var = substr(dat,index(dat," "));
cnt_0 = count(var,'0');
cnt_m = count(var,'.');
cnt_1 = count(var,'1');
drop dat var rev:;
cards;
101 a 0 0 . 1
101 b 1 1 0 0
101 c 0 . 1 0
102 b 1 0 1 0
102 c 1 0 . 1
102 d . 1 1 0
;
proc sql;
create table want as
select id, sum(cnt_0) as cnt_0,sum(cnt_m) as cnt_m,sum(cnt_1) as cnt_1 from First group by id;
quit;
data FIRST;
input ID TYPE $ REV1 REV2 REV3 REV4;
cards;
101 a 0 0 . 1
101 b 1 1 0 0
101 c 0 . 1 0
102 b 1 0 1 0
102 c 1 0 . 1
102 d . 1 1 0
run;
proc sql;
create table SUM1(drop=TOTAL_N) as
select ID
,sum(sum ( REV1, REV2, REV3, REV4)) as TOTAL_1
,sum(cmiss( REV1, REV2, REV3, REV4)) as TOTAL_M
,sum(n ( REV1, REV2, REV3, REV4)) as TOTAL_N
,calculated TOTAL_N - calculated TOTAL_1 as TOTAL_0
from FIRST
group by ID;
create table SUM2(drop=TOTAL_N) as
select ID
,TYPE
,sum(sum ( REV1, REV2, REV3, REV4)) as TOTAL_1
,sum(cmiss( REV1, REV2, REV3, REV4)) as TOTAL_M
,sum(n ( REV1, REV2, REV3, REV4)) as TOTAL_N
,calculated TOTAL_N - calculated TOTAL_1 as TOTAL_0
from FIRST
group by ID, TYPE;
quit;
Hi ... since you can use CAT functions with numeric data, how about a different data step ...
data First;
input ID :$3. type :$1. rev1-rev4;
cnt_0 = count(catt(of rev:),'0');
cnt_m = count(catt(of rev:),'.');
cnt_1 = count(catt(of rev:),'1');
drop rev:;
cards;
101 a 0 0 . 1
101 b 1 1 0 0
101 c 0 . 1 0
102 b 1 0 1 0
102 c 1 0 . 1
102 d . 1 1 0
;
Searching for Variable Values with CAT Functions: An Alternative to Arrays and Loops
http://www.albany.edu/~msz03/epi697/ho/cat_find_rev.pdf
and ...
http://www.sas.com/content/dam/SAS/en_ca/User%20Group%20Presentations/TASS/Zdeb-CATFunctions.pdf
Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.
If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.