Sum Character variables

Reply
Regular Contributor
Posts: 150

Sum Character variables

I have country 42 country variables that each have country codes for where participants have traveled to, so a maximum of 42 instances of traveling. The variables are country1-42. For each participant, the first variable would include the first place they traveled, the second would include the second place they traveled, and so on. The data looks like this:

IDcountry1country2country3
1AUSFRAAUS
2CACACR
3CRAUSAUS
4CACA.

For the 42 possible trips, not everyone will have all data, for instance, some participants would have only traveled once, others all times, and everywhere in between...

Is there a way to sum the number of times a particular country code was visited. So in the example data I would have a sum of 4 for AUS, 4 for CA, 1 for FRA, and 2 for CR. Any help would be greatly appreciated.

Super User
Super User
Posts: 6,318

Re: Sum Character variables

Just transpose the data first.

proc transpose data=have out=middle (rename=(col1=country)) ;

  by id;

  var country: ;

run;

proc freq ;

tables country;

run;

Regular Contributor
Posts: 150

Re: Sum Character variables

Simple enough, thank you! Now if I want to determine how many different countries a person went to and the most frequent country visited how would I do that. So say for ID 1 they went to 2 countries, and AUS was most frequent. For participant 4 they only went to CA. Any help with this would be greatly appreciated.

Contributor
Posts: 52

Re: Sum Character variables


A solution amongst others.

the previous solution slightly modified.

data _null_;
   length cnt 8.;
   array cty(10) $4.;

   if _N_=1 then do;
      declare hash ha(multidata:'N',ordered:'A');
         ha.definekey('zCty');
         ha.definedata('zCty','zCnt');
         ha.definedone();
      declare hash hb(multidata:'N');
         hb.definekey ('zId','zCty');
         hb.definedata('zId','zCty','yCnt');
         hb.definedone();
   end;

   do until(aDone);
      set t_have end=aDone;
      do i = 1 to 10;
         zCty=cty(i);
         zId=Id;
         if (zCty='') then continue;
         rc = ha.find();
         if (rc>0) then do; zCnt=1; ha.add(); end;
                   else do; zCnt+1; rc = ha.replace(key:zCty, data:zCty, data:zCnt); end;
         rc = hb.find();
         if (rc>0) then do; yCnt=1; hb.add(); end;
                   else do; yCnt+1;
                            rc = hb.replace(key:zId, key:zCty, data:zId, data:zCty, data:yCnt); end;
      end;
   end;
   ha.output(dataset:'t_want1(rename=(zCty=Cty zCnt=Cnt))');
   hb.output(dataset:'t_want2(rename=(zId=Id zCty=Cty yCnt=Cnt))');
run;

your last request lies in dataset t_want3 (with c_cty = number of distinct countries visited for a given ID).


proc sql;
   create table t_want3 as
   select a.id, a.cty, a.cnt, c.c_cty
   from        t_want2 a
   inner join (select id, max(cnt) as mx_cnt
               from   t_want2
               group by id) b
   on (a.id=b.id) and (a.cnt=b.mx_cnt)
   inner join (select id, count(distinct cty) as c_cty
               from   t_want2
               group by id) c
   on (a.id=c.id)
   order by a.id, a.cty;
quit;

dataset t_want3 becomes:
=============================
Id    Cty    Cnt    c_cty

101    A22     4       5
102    A23     2       7
103    A24     3       5
104    A12     3       5
105    A12     2       7
105    A19     2       7
105    A25     2       7
106    A13     2       7
106    A16     2       7
106    A20     2       7
107    A25     2       8
108    A25     3       5
109    A12     2       8
109    A15     2       8
110    A24     4       6
=============================

hope this helps.

Grand Advisor
Posts: 17,325

Re: Sum Character variables

Once your data is transpose run a proc freq, by person for number of countries visited and by person destination sorted descending for the most frequent country.

Contributor
Posts: 52

Re: Sum Character variables

A solution amongst others.

/******************************/
/**** random input dataset ****/
/******************************/
data t_have(keep=id ctySmiley Happy;
   length id 8.;
   array cty(10) $4.;
   do id= 101 to 110;
      aa = int(3*ranuni(7));
      call missing(of cty(*));
      do j = 1 to 10-aa;
         ab= 10+ceil(15*ranuni(7));
         cty(j)= compress('A'||ab);
      end;
      output;
   end;
run;

dataset t_have becomes:
=======================================================================================
id    cty1    cty2    cty3    cty4    cty5    cty6    cty7    cty8    cty9    cty10

101    A22     A22     A23     A17     A22     A23     A21     A16     A17      A22
102    A25     A18     A21     A12     A24     A23     A23     A19
103    A24     A24     A11     A15     A24     A17     A12     A12
104    A12     A19     A18     A18     A12     A12     A22     A15     A22
105    A13     A12     A12     A19     A15     A25     A16     A25     A19      A11
106    A20     A23     A24     A13     A16     A11     A13     A12     A16      A20
107    A13     A25     A19     A12     A20     A25     A21     A14     A24
108    A25     A18     A18     A25     A15     A11     A24     A24     A15      A25
109    A12     A17     A21     A12     A16     A11     A15     A15     A24      A23
110    A24     A19     A25     A14     A24     A25     A22     A24     A13      A24
=======================================================================================

/*****************************/
/**** a proposed solution ****/
/*****************************/
data _null_;
   length cnt 8.;
   array cty(10) $4.;

   if _N_=1 then do;
      declare hash ha(multidata:'N',ordered:'A');
         ha.definekey('zCty');
         ha.definedata('zCty','zCnt');
         ha.definedone();
   end;

   do until(aDone);
      set t_have end=aDone;
      do i = 1 to 10;
         zCty=cty(i);
         if (zCty='') then continue;
         rc = ha.find();
         if (rc>0) then do; zCnt=1; ha.add(); end;
                   else do; zCnt+1; rc = ha.replace(key:zCty, data:zCty, data:zCnt); end;
      end;
   end;
   ha.output(dataset:'t_want(rename=(zCty=Cty zCnt=Cnt))');
run;

dataset t_want becomes:

===============
Cty    Cnt

A11      5
A12     12
A13      5
A14      2
A15      7
A16      5
A17      4
A18      5
A19      6
A20      3
A21      4
A22      7
A23      6
A24     13
A25     10
===============


hope this helps.

Ask a Question
Discussion stats
  • 5 replies
  • 242 views
  • 0 likes
  • 4 in conversation