BookmarkSubscribeRSS Feed
DannyT
Calcite | Level 5

I'm stumped.

I want to reduce a really big data-set to fewer observations, but containing all the levels of the original data-set (at most) once. This would be done for testing purposes, so it's in our best interest to come up with final data-set that has least number of obs.

So if we take data-set SASHELP.CLASS for example. I would like to find out the first obs (or any non-specific obs really, but i feel it might be easier with first/last obs) that covers each individual levels of the original data-set, not all possible (existing, or theoretical) combinations of the required variables.

Output would look something like this: (SASHELP.CLASS: all levels by AGE, SEX):

  • AGE has 6 distinct levels (11-16)
  • SEX has 2 distinct levels ("F", "M")
  • Least number of obs covering those levels theoretically is 6.

So we should end up with:

  • OBS #1: Affred, "M", 14
  • OBS #2: Alice, "F", 13

Barbara, Carol, and Henry will not be output, as both F-M and 13-14 are covered

  • OBS #3: James, "M", 12
  • OBS #4: Janet, "F", 15
  • OBS #5: Joyce, "F", 11
  • OBS #6: Philip, "M", 16

end of output

In this case we went through data sequentially, and we've found minimum number of obs (6) satisfying the requirement, but if the levels get more lengthy (or inter-correlated), and data assorted (supposed we could pre-sort), we might end up with something that is close to minimum, but not quite minimum of longest level among required variables.

I would imagine this would require some sort of recursive algorithm to get most efficiently small number of obs to cover the values, but have no clue where to start. Any help will be greatly appreciated!

3 REPLIES 3
data_null__
Jade | Level 19

This seems to work.  You might replace the arrays with associative array so you don't have to guess at the dimension.

data class;
   set sashelp.class;
   array s[3] $1 _temporary_;
  
array a[10] _temporary_ ;
   fs = whichc(sex,of s
  • );  
      
  • if fs eq 0 then do;
          x+
    1;
          s=sex;
          end;
       fa= whichn(age,of a
  • );
      
  • if fa eq 0 then do;
          y+
    1;
          a=age;
         
    end;
      
    if not (fs and fa);
       drop fs fa x y;
       run;
    proc print;
      
    run;
    DannyT
    Calcite | Level 5

    Thanks for your reply, data_null_;

    It may be because of v9.1.3 of SAS i'm running, but I seem to be getting the following error after "fs" and "fa" definitions:

    ERROR: The ARRAYNAME

  • specification requires a variable based array.
  • data_null__
    Jade | Level 19

    remove _TEMPORARY_ and add retain

    RETAIN s a;

    and maybe drop s1-s3 a1-a10;

    sas-innovate-2024.png

    Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

    Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

     

    Register now!

    How to connect to databases in SAS Viya

    Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

    Find more tutorials on the SAS Users YouTube channel.

    Discussion stats
    • 3 replies
    • 900 views
    • 0 likes
    • 2 in conversation