BookmarkSubscribeRSS Feed
Yurie
Fluorite | Level 6

Hello,

I need to count different values in a row for a longitudinal database. For example, I have

data First;

input ID type$ rev1 rev2 rev3 rev4;

cards;

101 a 0 0 . 1

101 b 1 1 0 0

101 c 0 .  1 0

102 b 1 0 1 0

102 c 1 0 .  1

102 d .  1 1 0

;

run;

I need the result 1:

ID    Type  count_0  count_missing  count_1

101    a        2           1                      1

101    b        2           0                      2

101    c        2           1                      1

102    b        2           0                      2

102    c        1           1                      2

102    d        1           1                      2

Result 2:

ID     Total_0  Total_missing   Total_1

101         6            2                     4

102         4            2                     6

Thank you so much for any help!!

I will be very appreciate for any help!

12 REPLIES 12
mastropi
Fluorite | Level 6

Hello Yurie,

One way is to use the SQL procedure:

data First;

input ID type$ rev1 rev2 rev3 rev4;

cards;

101 a 0 0 . 1

101 b 1 1 0 0

101 c 0 .  1 0

102 b 1 0 1 0

102 c 1 0 .  1

102 d .  1 1 0

;

proc sql;

    create table First_counts as

    select

        id

        ,type

        ,(case when rev1 = 0 then 1 else 0 end) +

         (case when rev2 = 0 then 1 else 0 end) +

         (case when rev3 = 0 then 1 else 0 end) +

         (case when rev4 = 0 then 1 else 0 end) as count_0

        ,(case when missing(rev1) then 1 else 0 end) +

         (case when missing(rev2) then 1 else 0 end) +

         (case when missing(rev3) then 1 else 0 end) +

         (case when missing(rev4) then 1 else 0 end) as count_miss

        ,(case when rev1 = 1 then 1 else 0 end) +

         (case when rev2 = 1 then 1 else 0 end) +

         (case when rev3 = 1 then 1 else 0 end) +

         (case when rev4 = 1 then 1 else 0 end) as count_1

    from First;

    select * from First_counts;

    select

        id

        ,sum(count_0) as total_0

        ,sum(count_miss) as total_miss

        ,sum(count_1) as total_1

    from First_counts

    group by id

    order by id;

quit;

For result 1: if the number of rev* variables increase you could write a macro to generate the lines that compute count_0, count_miss, count_1.

Otherwise you could use ARRAYs in a DATA step and compute the counts "manually" by iterating on the different rev* variables. I leave this as an exercise.

HTH

Daniel

EricHoogenboom
Fluorite | Level 6

Hello Yurie,

Use arrays for your goal.

Below a start. Things may be written shorter, but this is easy to read.

data want (drop=_:);

   set first;

   count_0 = 0;

   count_missing = 0;

   count_1 = 0;

   array a_rev rev:;

   do _i = 1 to dim(a_rev);

      if a_rev(_i) = 0 then count_0 = count_0 + 1;

      etc;

   end;

run;

Hope this helps,

Eric

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Well, your quite correct about using arrays.  However as the poster is counting certain characters, the easiest way would be to create a string of all elements and then countc() to count each one:

data want (drop=tmp rev1-rev4);

  set first;

  array rev{4};

  tmp=cats(of rev{*});

  count_miss=countc(tmp,".");

  count_0=countc(tmp,"0");

  count_1=countc(tmp,"1");

run;

ballardw
Super User

A transpose and report procedure approach:

data First;
input ID type$ rev1 rev2 rev3 rev4;
cards;
101 a 0 0 . 1
101 b 1 1 0 0
101 c 0 .  1 0
102 b 1 0 1 0
102 c 1 0 .  1
102 d .  1 1 0
;
run;


proc transpose data=first out=transfirst;
by id type;
run;

proc tabulate data=transfirst;
class id type;
class col1 / missing;
table id* type, col1='Value'*n=''*f=best4.;
table id, col1='Value'*n=''*f=best4.;
run;

Yurie
Fluorite | Level 6

Hello, Ballardw, RW9, Eric, and Mastropi

I am so grateful for your answers. All answers are great and provide me a big help! It's really amazing!

Best & blessings,

Yurie

MikeZdeb
Rhodochrosite | Level 12

Hi.  If the goal is two data sets (assumes the data set FIRST is sorted by ID ...

data result1 (keep=id type count:) result2 (keep=id total:);

call missing(total_1, total_0, total_miss);

do until (last.id);

set first;

by id;

  count_1    = sum(of rev:);

  count_miss = nmiss(of rev:);

  count_0    = n(of rev:) - count_1;

  total_1    + count_1;

  total_0    + count_0;

  total_miss + count_miss;

  output result1;

end;

output result2;

run;

RESULT1 ...

Obs    id    type    count_0  count_miss     count_1

1     101     a         2          1           1

2     101     b         2          0           2

3     101     c         2          1           1

4     102     b         2          0           2

5     102     c         1          1           2

6     102     d         1          1           2

RESULT2 ...

Obs     id    total_0   total_miss     total_1

1     101       6          2             4

2     102       4          2             6

ChrisNZ
Tourmaline | Level 20

data _V/view=_V;

  set FIRST;

  COUNT_1 = sum(of REV:);

  COUNT_MISS = nmiss(of REV:);

  COUNT_0 = n(of REV:) - COUNT_1;

  drop REV:;

run;

proc means data=_V noprint;

  class ID TYPE ;

  var COUNT:;

  types ID   ID*TYPE;

  output out=SUM sum=;

run;

proc print data=SUM noobs;

  where _TYPE_=2;

  var ID COUNT:;

proc print data=SUM noobs;

  where _TYPE_=3;

  var ID TYPE COUNT:;

run;

ID

COUNT_1

COUNT_MISS

COUNT_0

101

4

2

6

102

6

2

4

ID

type

COUNT_1

COUNT_MISS

COUNT_0

101

a

1

1

2

101

b

2

0

2

101

c

1

1

2

102

b

2

0

2

102

c

2

1

1

102

d

2

1

1

MikeZdeb
Rhodochrosite | Level 12

Hi.  Thirteen statements versus fifteen in just a data step ... a virtual tie (with desired variable names from the data step).

ps Could have simplified the CALL MISSING statement by placing it after the appearance of the various TOTAL variables ...

data result1 (keep=id type count:) result2 (keep=id total:);

do until (last.id);

set first;

by id;

  count_1    = sum(of rev:);

  count_miss = nmiss(of rev:);

  count_0    = n(of rev:) - count_1;

  total_1    + count_1;

  total_0    + count_0;

  total_miss + count_miss;

  output result1;

end;

output result2;

call missing(of total:);

run;

ChrisNZ
Tourmaline | Level 20

What do you mean a tie? Two fewer! And I can rename the variables too, and put the drop statement as an option! Smiley Wink

Seriously, this was more about separating the data preparation from the calculations.

No better or worse, just different. Smiley Happy

MadhuKorni
Quartz | Level 8

data First;

input ID type$ rev1 rev2 rev3 rev4;

dat = _infile_;

var = substr(dat,index(dat," "));

cnt_0 = count(var,'0');

cnt_m = count(var,'.');

cnt_1 = count(var,'1');

drop dat var rev:;

cards;

101 a 0 0 . 1

101 b 1 1 0 0

101 c 0 .  1 0

102 b 1 0 1 0

102 c 1 0 .  1

102 d .  1 1 0

;

proc sql;

create table want as

select id, sum(cnt_0) as cnt_0,sum(cnt_m) as cnt_m,sum(cnt_1) as cnt_1 from First group by id;

quit;

ChrisNZ
Tourmaline | Level 20

data FIRST;

input ID TYPE $ REV1 REV2 REV3 REV4;

cards;

101 a 0 0 . 1

101 b 1 1 0 0

101 c 0 .  1 0

102 b 1 0 1 0

102 c 1 0 .  1

102 d .  1 1 0

run;

proc sql;

  create table SUM1(drop=TOTAL_N) as

  select ID

        ,sum(sum  ( REV1, REV2, REV3, REV4)) as TOTAL_1

        ,sum(cmiss( REV1, REV2, REV3, REV4)) as TOTAL_M

        ,sum(n    ( REV1, REV2, REV3, REV4)) as TOTAL_N

        ,calculated TOTAL_N - calculated TOTAL_1 as TOTAL_0

  from FIRST

  group by ID;

  create table SUM2(drop=TOTAL_N) as

  select ID

        ,TYPE

        ,sum(sum  ( REV1, REV2, REV3, REV4)) as TOTAL_1

        ,sum(cmiss( REV1, REV2, REV3, REV4)) as TOTAL_M

        ,sum(n    ( REV1, REV2, REV3, REV4)) as TOTAL_N

        ,calculated TOTAL_N - calculated TOTAL_1 as TOTAL_0

  from FIRST

  group by ID, TYPE;

quit;

MikeZdeb
Rhodochrosite | Level 12

Hi ... since you can use CAT functions with numeric data, how about a different data step ...

data First;

input ID :$3. type :$1. rev1-rev4;

cnt_0 = count(catt(of rev:),'0');

cnt_m = count(catt(of rev:),'.');

cnt_1 = count(catt(of rev:),'1');

drop rev:;

cards;

101 a 0 0 . 1

101 b 1 1 0 0

101 c 0 .  1 0

102 b 1 0 1 0

102 c 1 0 .  1

102 d .  1 1 0

;

Searching for Variable Values with CAT Functions: An Alternative to Arrays and Loops

http://www.albany.edu/~msz03/epi697/ho/cat_find_rev.pdf

and ...

http://www.sas.com/content/dam/SAS/en_ca/User%20Group%20Presentations/TASS/Zdeb-CATFunctions.pdf

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 12 replies
  • 1919 views
  • 8 likes
  • 8 in conversation