BookmarkSubscribeRSS Feed
out_of_my_wheelhouse
Calcite | Level 5

Hi, I've stumbled on a problem that I'm having trouble figuring out how to approach.  At the title suggests, I have a variable "XCOMBINED" that contains a string of values that were concatenated from previously 15 distinct variables (X1-X15) with each numeric value being delineated by a space.  Therefore, in the "XCOMBINE" variable there could be anywhere from 1 to 15 numeric values that contain up to 5 characters each.  Is there a way to count or graph which of these values occurs the most frequently in the observations?

Alternatively, I also have all 15 variables separate.  Is there a way to determine the frequency of values in 15 different variables simultaneously?  Obviously I can plot X1 or X2 frequency individually, but I can't figure out a way to count the frequency of multiple variables in one graph.

Thanks for your help!

7 REPLIES 7
ndp
Quartz | Level 8 ndp
Quartz | Level 8

Need to have some grouping variable/s<id> try this:

proc transpose data=x out=y;
by <id>;
var x1-x15;
run;

proc freq data=y;
tables <id>;
by _NAME_;
where nmiss(col1)=0;
run;

out_of_my_wheelhouse
Calcite | Level 5

Thanks for replying, but I'm a bit confused what <id> is meant to represent.  Can you explain?   The log is telling me it's causing a syntax error. 

ndp
Quartz | Level 8 ndp
Quartz | Level 8

<id> is any variable/s you need to create counts by (grouping )

Reeza
Super User

FYI if you post sample data you're more likely to get code that works for you.

The key here is to transpose your data from wide to long format. Then you can run a proc freq on all the results in one pass. I think proc freq even generates some standard graphs these days.

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Reeza has the best answer here, normalise the data then aggregate it.  I would, just for fun, show an example of using arrays on the string you have:

data have; 

  length xcombined $2000;

  infile datalines dlm="|";

  input xcombined $;

datalines;

1234 567 1324 8723 56 56

673674 1235 1235 1235 2277 4564

;

run;

data want;

  set have;

  array elements{7,2} $20;

  curr_element=1;

  do while (scan(xcombined,1," ") ne "");

    elements{curr_element,1}=scan(xcombined,1," ");

    elements{curr_element,2}=strip(put(count(xcombined,scan(xcombined,1," ")),best.));

    xcombined=strip(tranwrd(xcombined,scan(xcombined,1," "),""));

    curr_element=curr_element+1;

  end;

run;

This keeps taking the first word from the string, counting all occurences, and putting the output in a 2 dimensional array, i.e. value, count.  Then that word is removed from the string and we keep going until there is nothing left.

out_of_my_wheelhouse
Calcite | Level 5

First of all, thank you so much.  This is a surprisingly helpful community, especially for someone with zero experience with coding.  I posted a part of the example data I spoke about before in case I did a poor job explaining it.  This is obviously "dxcombine" instead of "xcombine" but I'm trying to accomplish the same thing.

https://lh3.googleusercontent.com/HlmCrfIMzdijFduxMWCRMUHAdsAbh2ZtI1ePQcl_sNM=w204-h413-no

I tried the example you posted, but I'm getting the following error:

581  data have;

582

583    length dxcombine $2000;

584

585    infile datalines dlm="|";

586

587    input dxcombine $;

588

589  datalines;

NOTE: The data set WORK.HAVE has 5 observations and 1 variables.

NOTE: DATA statement used (Total process time):

      real time           0.01 seconds

      cpu time            0.00 seconds

595  ;

596

597  run;

598

599  data want;

600

601    set have;

602

603    array elements{7,2} $20;

604

605    curr_element=1;

606

607    do while (scan(dxcombine,1," ") ne "");

608

609      elements{curr_element,1}=scan(dxcombine,1," ");

610

611      elements{curr_element,2}=strip(put(count(dxcombine,scan(dxcombine,1," ")),best.));

612

613      xcombined=strip(tranwrd(dxcombine,scan(dxcombine,1," "),""));

614

615      curr_element=curr_element+1;

616

617    end;

618

619  run;

ERROR: Array subscript out of range at line 609 column 5.

dxcombine=1234 567 1324 8723 56 56 elements1=1234 elements2=1 elements3=1234 elements4=1

elements5=1234 elements6=1 elements7=1234 elements8=1 elements9=1234 elements10=1

elements11=1234 elements12=1 elements13=1234 elements14=1 curr_element=8

xcombined=567 1324 8723 56 56 _ERROR_=1 _N_=2

NOTE: The SAS System stopped processing this step because of errors.

NOTE: There were 2 observations read from the data set WORK.HAVE.

WARNING: The data set WORK.WANT may be incomplete.  When this step was stopped there were 1

         observations and 17 variables.

WARNING: Data set WORK.WANT was not replaced because this step was stopped.

NOTE: DATA statement used (Total process time):

      real time           0.02 seconds

      cpu time            0.03 seconds

RW9
Diamond | Level 26 RW9
Diamond | Level 26

This is down to the value I hardcoded in this row:

603    array elements{7,2} $20;

Basically I am assuming there will be no more than 7 distinct elements in the string, this is just the test data I have at the time.  Set it to elements{20,2} and that should cover it.  I does however illustrate neatly why Reeza's answer is the better of the two.  If you normalise your data:

ROW          DX

1                 5849

1                 42822

...

2                41071

...

Then you can very easily run a freq/means procedure, or some simple code to get your result:

proc sq;

     create table WANT as

     select     ROW,

                   DX,

                   count(DX) as RESULT

     from       HAVE

     group by ROW,

                   DX;

quit;

Now to get the normalised data you can just do:

data inter (keep=row dx);

     set have;

     i=1;

     do until (scan(dxcombine,i,' ') = '');

         row=_n_;

         dx=scan(dxcombine,i,' ');

          output;

          i=i+1;

     end;

run;

This outputs one row per scanned word.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 1478 views
  • 0 likes
  • 4 in conversation