BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
WendyT
Pyrite | Level 9

SAS folks-

I need your help again - I'm stuck.   I'm working on a report where I need to generate counts and statistics based on the value of codes within a dataset.

However, the codes may be combined in the data, so I need to find a way to first generate, then report on the unique codes.  All codes are single characters, but to make it more fun, there is no order to the codes, they may repeat in an observation, and there are special characters (< and ? that I know of).

Here are some examples of what might occur looking at 12 observations with possible codes of null, I, J, Q >  (may be up to 20 different codes depending on the dataset involved, last two years for one project is 80K obs, so it will get big):

obs   code

1     I

2     J

3     QJ

4    

5    

6     JQ

7     IJ>Q

8    

9     >

10    

11    JIJ

12    QJI

a basic start to my report might be something like this:

code     count  nobs     %

J          6     12     50%

Q          4     12     33%

>          2     12     17%

I          4     12     33%

no code    4     12     33%

I would like to use this solution for different datasets without knowing anything about the codes in advance.

Also, any tips on how to generate the report would also be very welcome - I'm playing with TABULATE.

Thanks for your help!

WendyT

1 ACCEPTED SOLUTION

Accepted Solutions
LarryWorley
Fluorite | Level 6

Wendy,

Here is another solution which uses some sql and a variation in the initial data step.

data x;

input code : $10. @@;
drop i_char code;

obs = _n_ ;
do i_char = 1 to length(code) ;
   single_char = substr(code,i_char,1) ;
   output ;
end ;
datalines;

I J QJ . . JQ IJ>Q . > . JIJ QJI

;

proc sql ;
  create table x_short as
  select distinct obs
                 ,single_char
    from x
    ;
  select max(obs) into :num_obs
    from x
    ;
  select single_char as code
        ,count(*)    as count
        ,&num_obs    as nobs
        ,(calculated count
         /&num_obs)     as percent
            format percent5.0
    from x_short
    group by single_char
    ;
quit ;


                            code      count      nobs  percent

                                               4        12     33%
                            >                 2        12     17%
                            I                  4        12     33%
                            J                 6        12     50%
                            Q                4        12     33%

View solution in original post

17 REPLIES 17
Cynthia_sas
SAS Super FREQ

Hi:

  Just curious, how are you counting observation 11 with a value of JIJ, where the J is repeated? It looks to me like you are only counting the J one time, even though it appears twice. You show a count of 6 for J, but there are actually 7 occurences of J in the data.

cynthia

WendyT
Pyrite | Level 9

HI Cynthia-

Sorry to be unclear.  Repeats of codes in a single observation only count as one. There really shouldn't be repeats, but it does happen sometimes.

Thanks!

Wendy

MikeZdeb
Rhodochrosite | Level 12

hi ... this gives you the table you posted (I'm sure that there will be lots of ideas on this one) ... a couple things make this interesting

given this "Repeats of codes in a single observation only count as one." you have to have some way of de-duplicating characters within a code

also, given your denominator is the original number of observations rather than the number of codes, not sure how you could get the correct percentages in either TABULATE or FREQ once you have a data set of single character codes to count

*your data;

data x;

input code : $10. @@;

datalines;

I J QJ . . JQ IJ>Q . > . JIJ QJI

;


data counts (keep=code count nobs pct);

array ch(32:126) _temporary_ (95*.);

set x end=last nobs=obs;

if length(code) eq 1 then ch(rank(code))+1;

else

do j=1 to length(code);

   ch(rank(char(code,j))) + (^find(code,trim(char(code,j)),j+1));

end;

if last then do j=32 to 126;

   nobs = obs;

   if ^missing(ch(j)) then do;

      count = ch(j);

      pct = ch(j) / obs;

   code = byte(j);

   output;

   end;

end;

format pct percent.;

run;

code   nobs    count    pct

        12       4      33%

>       12       2      17%

I       12       4      33%

J       12       6      50%

Q       12       4      33%

Message was edited by: Mike Zdeb ... 10 minutes more thought, less SAS code

WendyT
Pyrite | Level 9

MikeZdeb-

Thanks so much for your solution!  I was able to run it on my whole dataset, but it's going to take me quite awhile to figure out how it works.

I've been noodling around with TABULATE, as this report will have to be sliced into various categories, and it looks like I'm going to have to do this in pieces to get the percentages the way I want them.

Thanks again!

WendyT

MikeZdeb
Rhodochrosite | Level 12

Hi ... FYI ... every character has a number associated with it ...  http://www.albany.edu/~msz03/epi514/notes/appendix_B.pdf.

You can find that number with the RANK function.  The numbers for printable characters range from 32 (space) to 126 (tilde,  ~).  So, the data step  sets up an array (CH) that ranges from CH(32) to CH(126) and that array is used to keep track of how many times each character is encountered.  Since you wanted characters de-dupped within codes, there's a loop that checks for duplicates (without that consideration, the code would be simpler).

When the counting is done (last observation read), another loop computes the percentages and writes observations to a data set.  There's another function that returns the character associated with each number, the BYTE function, and it is used in the loop to add the character (CODE) to the data set.

WendyT
Pyrite | Level 9

Everyone-

Thanks so much for taking the time to help me out - I have some new avenues to explore.  It's wonderful to see different ways to solve a problem.

I have a meeting with our computer folks to discuss (and hopefully schedule) our upgrade to 9.3 later today.  Smiley Happy

With grateful thanks...

WendyT

joehinson
Calcite | Level 5

Hello Wendy, here is a hash solution.

First, as observations are read in, codes are stripped from their texts. A smaller hash object ("rep") then checks for repeats. If a code is repeated in an observation then count is suppressed. Another hash object ("cod") then accumulates unique codes and stores counts per codes: (I haven't dealt with the final output table that you want because that should be easy).

The nice thing about this hash approach is that you need not know anything in advance about the codes, the number and sizes of the texts, the repeats, etc:

data have;

infile datalines missover;

input seq codetext $;

datalines;

1     I

2     J

3     QJ

4    

5    

6     JQ

7     IJ>Q

8    

9     >

10    

11    JIJ

12    QJI

;

run;

data _null_;

length code $1 codecount coderepeat codelen 8;

if(1=2) then set have;

declare hash cod(ordered:"a");

cod.defineKey("code");

cod.defineData("code","codecount");

cod.defineDone();

declare hash rep(ordered:"a");

rep.defineKey("code");

rep.defineDone();

code=".";

codecount=0;

k=0;

coderepeat=0;

do until (done);

set have end=done;

rx=rep.clear();

codelen=length(codetext);

k=k+1;

codecount=1;

do x=1 to codelen;

code=substr(codetext,x,1);

rp=rep.add();

coderepeat=countc(codetext,code,"i");

rc=cod.find();

codecount=codecount+(rc=0)*(rp=0);

rca=cod.replace();

codecount=0;

end;

end;

cod.output(dataset:"codetable");

stop;

run;

OUTPUT:

  

  code codecount
1   4
2 > 1
3 I 4
4 J 6
5 Q 4
joehinson
Calcite | Level 5

Hmmmm, why is ">" giving a count of 1 instead of 2?  Will fix and get back!

LarryWorley
Fluorite | Level 6

Wendy,

Here is another solution which uses some sql and a variation in the initial data step.

data x;

input code : $10. @@;
drop i_char code;

obs = _n_ ;
do i_char = 1 to length(code) ;
   single_char = substr(code,i_char,1) ;
   output ;
end ;
datalines;

I J QJ . . JQ IJ>Q . > . JIJ QJI

;

proc sql ;
  create table x_short as
  select distinct obs
                 ,single_char
    from x
    ;
  select max(obs) into :num_obs
    from x
    ;
  select single_char as code
        ,count(*)    as count
        ,&num_obs    as nobs
        ,(calculated count
         /&num_obs)     as percent
            format percent5.0
    from x_short
    group by single_char
    ;
quit ;


                            code      count      nobs  percent

                                               4        12     33%
                            >                 2        12     17%
                            I                  4        12     33%
                            J                 6        12     50%
                            Q                4        12     33%

WendyT
Pyrite | Level 9

LarryWorley-

Thanks so much - this one I understand! (so you get the "Correct Answer" award)  I didn't know that one SQL step could contain multiple selects and table creations (I saved the output as a table when I applied it to my big dataset).

I will be able to apply this in so many places.  Smiley Happy

WendyT

Message was edited by: Wendy Tweedale

joehinson
Calcite | Level 5

Here is the fixed hash solution (SAS 9.2 or 9.3).

As I said, with the hash approach, one does not need to know in advance the sizes of arrays, etc.:

data have;

infile datalines missover;

input seq codetext $;

datalines;

1     I

2     J

3     QJ

4    

5    

6     JQ

7     IJ>Q

8    

9     >

10    

11    JIJ

12    QJI

;

run;

data _null_;

length code $1 codecount coderepeat codelen 8;

if(1=2) then set have;

declare hash cod(ordered:"a");

cod.defineKey("code");

cod.defineData("code","codecount");

cod.defineDone();

declare hash rep(ordered:"a");

rep.defineKey("code");

rep.defineDone();

code=".";

codecount=1;

do until (done);

set have end=done;

rx=rep.clear();

codelen=length(codetext);

codecount=1;

do x=1 to codelen;

code=substr(codetext,x,1);

rp=rep.add();

rc=cod.find();

codecount=codecount+((rc=0)*(rp=0));

rca=cod.replace();

codecount=1;

end;

end;

cod.output(dataset:"codetable");

stop;

run;

OUTPUT:

  

  code codecount
1   4
2 > 2
3 I 4
4 J 6
5 Q 4
WendyT
Pyrite | Level 9

joehinson-

Thanks so much for the hash solution- I don't understand how it works, but it runs beautifully.

I have so much to learn...

WendyT

joehinson
Calcite | Level 5

This one is just for future considerations. Proc Tabulate is not going to go away anytime soon, but new tools are in the horizon.

SAS 9.3 comes with an experimental tool: ODS Report Writing Interface, which also employs objects, to create pretty output tables.

ODS objects play very well with hash objects, as shown below: (The output is PDF, and the location must be specified in the ODS statement):

data _null_;

       length code $1 codecount datasize percents codelen 8;

       if(1=2) then set have nobs=nsize;

       declare hash cod(ordered:"a");

              cod.defineKey("code");

              cod.defineData("code","codecount", "datasize", "percents");

              cod.defineDone();

       declare hiter hicod("cod");

       declare hash rep(ordered:"a");

              rep.defineKey("code");

              rep.defineDone();

       code=".";

       codecount=1;

       datasize=0;

       percents=0;

       do until (done);

              set have end=done;

              rx=rep.clear();

              codelen=length(codetext);

              codecount=1;

              do x=1 to codelen;

                     code=substr(codetext,x,1);

                     rp=rep.add();

                     rc=cod.find();

                     datasize=nsize;

                     codecount=codecount+((rc=0)*(rp=0));

                     percents=int(100*(codecount/datasize));

                     rca=cod.replace();

                     codecount=1;

              end;

       end;

*========USING THE SAS 9.3 ODS REPORT WRITING INTERFACE===========================================;

options nonumber nodate nocenter;

ods listing close;

ods pdf file="C:\Documents and Settings\hinsonj\Desktop\SASoutputs\WendysTable.pdf" notoc; *<------(CHANGE TO YOUR OWN LOCATION);

declare odsout summ();

      

       summ.title(text:"Wendy's Desired Table of Code Frequencies  " ||" %sysfunc(today(),mmddyy10.)");

       summ.layout_gridded(columns:1);

              summ.region();

                     summ.table_start();

                           summ.row_start();

                                  summ.format_cell(inhibit:"TLR", text: "CODE", overrides:"cellwidth=4cm just=center");

                                  summ.format_cell(inhibit:"TLR", text: "COUNTS", overrides:"cellwidth=4cm just=center");

                                  summ.format_cell(inhibit:"TLR", text: "NOBS", overrides:"cellwidth=4cm just=center");

                                  summ.format_cell(inhibit:"TLR", text: "PERCENTS", overrides:"cellwidth=4cm just=center");

                           summ.row_end();

                     summ.table_end();

                    

                     summ.table_start();

                     summ.line();

                           do while(hicod.next() eq 0);

                           summ.row_start();

                                  summ.format_cell(inhibit:"BTLR", text: code, overrides:"cellwidth=4cm just=center");

                                  summ.format_cell(inhibit:"BTLR", text: codecount, overrides:"cellwidth=4cm just=center");

                                  summ.format_cell(inhibit:"BTLR", text: datasize, overrides:"cellwidth=4cm just=center");

                                  summ.format_cell(inhibit:"BTLR", text: percents, overrides:"cellwidth=4cm just=center");

                           summ.row_end();

                           end;

             

                     summ.table_end();

                     summ.line();

             summ.region();

        summ.layout_end();

stop;

run;

ods pdf close;

   

Andre
Obsidian | Level 7

Joe,

I was  surprised by your code, as it is completely new for me!

but when i tried it , i encounter difficulties even after adding a data _null_; in the head

(i use 9.3.0 under windows)

What is the correction, just to see the final production.

Andre

see further:

112  options nonumber nodate nocenter;

113  ods listing close;

114  ods pdf file="d\notes\test\WTable.pdf" notoc;

NOTE: Writing ODS PDF output to DISK destination

      "d:\My SAS Files\9.3\d\notes\test\WTable.pdf", printer "PDF".

115  data _null_;

116  declare odsout summ();

117         summ.title(text:"Wendy's Desired Table of Code Frequencies  " ||"

117! %sysfunc(today(),mmddyy10.)");

118         summ.layout_gridded(columns:1);

119                summ.region();

120                       summ.table_start();

121                             summ.row_start();

122                                    summ.format_cell(inhibit:"TLR", text: "CODE",

122! overrides:"cellwidth=4cm just=center");

123                                    summ.format_cell(inhibit:"TLR", text: "COUNTS",

123! overrides:"cellwidth=4cm just=center");

124                                    summ.format_cell(inhibit:"TLR", text: "NOBS",

124! overrides:"cellwidth=4cm just=center");

125                                    summ.format_cell(inhibit:"TLR", text: "PERCENTS",

125! overrides:"cellwidth=4cm just=center");

126                             summ.row_end();

127                       summ.table_end();

128                       summ.table_start();

129                       summ.line();

130                             do while(hicod.next() eq 0);

                                         ----------

                                         557

ERROR: DATA STEP Component Object failure.  Aborted during the COMPILATION phase.

ERROR 557-185: Variable hicod is not an object.

NOTE: The SAS System stopped processing this step because of errors.

NOTE: L'étape DATA used (Total process time):

      real time           0.08 secondes

      cpu time            0.03 secondes

131                             summ.row_start();

132                                    summ.format_cell(inhibit:"BTLR", text: code,

132! overrides:"cellwidth=4cm just=center");

133                                    summ.format_cell(inhibit:"BTLR", text: codecount,

133! overrides:"cellwidth=4cm just=center");

134                                    summ.format_cell(inhibit:"BTLR", text: datasize,

134! overrides:"cellwidth=4cm just=center");

135                                    summ.format_cell(inhibit:"BTLR", text: percents,

135! overrides:"cellwidth=4cm just=center");

136                             summ.row_end();

137                             end;

138                       summ.table_end();

139                       summ.line();

140               summ.region();

141          summ.layout_end();

142  stop;

143  run;

144  ods pdf close;

NOTE: ODS PDF printed no output.

      (This sometimes results from failing to place a RUN statement before the ODS PDF CLOSE

      statement.)

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 17 replies
  • 2134 views
  • 6 likes
  • 6 in conversation