BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
left
Obsidian | Level 7

Preparations to reproduce my situation

 

 

proc format;
    value _grp
        1 = "Group 1"
        2 = "Group 2"
        3 = "Group 3";
    value _yesno
        1 = "Yes"
        0 = "No";
run;

/* Simulate an input dataset */
data source (keep = id group_var var:);
    call streaminit(123);
    attrib  id                          label = "ID"
            group_var   format = _grp.  label = "Categorical group variable";
    format  var1 - var9 _yesno.;
    label   id = "ID"
            group_var = "Categorical group variable";
    format  ;
    do i = 1 to 1000;
        id = i;
        /* Simulate data for categorical data. */
        group_var = rand("integer", 1, 3);          /* Create values "1", "2" or "3". */
        
        array yes_no var1-var9;
        do over yes_no;
            yes_no = rand("integer", 1, 2) - 1;     /* Create values "0" and "1". */
        end;
        output;
    end;
run;

This gives a dataset like this:

 

left_2-1670239129161.png

 

Based on the input I create a variable endpoint in dataset have with this code:

 

data have;
    set source (in = a keep = id group_var  where = (group_var = 1))
        source (in = b keep = id group_var  where = (group_var = 2))
        source (in = c keep = id group_var  where = (group_var = 3))
        source (in = d keep = id var1       where = (var1      = 1))
        source (in = e keep = id var2       where = (var2      = 1))
        source (in = f keep = id var3       where = (var3      = 1))
        source (in = g keep = id var4       where = (var4      = 1))
        source (in = h keep = id var5       where = (var5      = 1))
        source (in = i keep = id var6       where = (var6      = 1))
        source (in = j keep = id var7       where = (var7      = 1))
        source (in = k keep = id var8       where = (var8      = 1))
        source (in = l keep = id var9       where = (var9      = 1));
    by id;
    if a then endpoint =  1;
    if b then endpoint =  2;
    if c then endpoint =  3;
    if d then endpoint =  4;
    if e then endpoint =  5;
    if f then endpoint =  6;
    if g then endpoint =  7;
    if h then endpoint =  8;
    if i then endpoint =  9;
    if j then endpoint = 10;
    if k then endpoint = 11;
    if l then endpoint = 12;
run;

 

 

The resulting dataset looks like this:

left_1-1670239089869.png

 

Problem to solve

 

There has to be a simpler way to code the dataset have.

 

I want to replace the steps 

 

if <dataset-id> then endpoint = <number-of-dataset>;

 

A bonus would be to simplify the set statements themselves, but this is not critical to regard this issue as solved.

 

Solution formats

I am looking for solutions avoiding SQL. I love to hear about it (for general solution), but want to have a dataset solution (hash or DoW-Loop is also acceptable).

 

What I tried so far

I wanted to translate the character values a - l to numeric values 1 - 12 using the byte() function, so I can query the variables a - l created by 12 in= options of the set statements:

 

data want_fail_1;
    set source (in = a keep = id group_var  where = (group_var = 1))
        source (in = b keep = id group_var  where = (group_var = 2))
        source (in = c keep = id group_var  where = (group_var = 3))
        source (in = d keep = id var1       where = (var1      = 1))
        source (in = e keep = id var2       where = (var2      = 1))
        source (in = f keep = id var3       where = (var3      = 1))
        source (in = g keep = id var4       where = (var4      = 1))
        source (in = h keep = id var5       where = (var5      = 1))
        source (in = i keep = id var6       where = (var6      = 1))
        source (in = j keep = id var7       where = (var7      = 1))
        source (in = k keep = id var8       where = (var8      = 1))
        source (in = l keep = id var9       where = (var9      = 1));
    by id;
    length a_to_l $12;
    a_to_l = collate(rank("a"));
    put a_to_l=;
    do z = 1 to vlength(a_to_l);
        if vvaluex(byte(96 + z)) then endpoint = z;
        *if vname(input(char(a_to_l, z), $1.) then endpoint = z;
    end;
    stop;
run;

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
PaigeMiller
Diamond | Level 26

So is the goal of this programming to produce the endpoint values, or do you really need var1-var9 with values 1 or missing? For a given ID, such as ID=2, do you really need all 5 rows of output? Do you really need all of the var1-var9 if you have the endpoint values?

 

Seems to me that an array ought to work. This gets most of what you want. You can change the 0s to missing if necessary.

 

data have2;
    set source(obs=10);
    by id;
    array v var1-var9;
    if first.id then do;
        endpoint=group_var; 
        output;
    end;
    do i=1 to dim(v);
        if v(i)=1 then do;
            endpoint=i+3;
            output;
        end;
    end;
    drop i;
run;

 

 

 

 

--
Paige Miller

View solution in original post

3 REPLIES 3
PaigeMiller
Diamond | Level 26

We can't work with data in a screen capture. We can only work with data provided as working SAS data step code, which you can create yourself or follow these instructions.

 

UPDATE: Please ignore this message, I thought the screen capture was the data, but it is not.

--
Paige Miller
LinusH
Tourmaline | Level 20

@PaigeMiller I think the input data is generated in the initial data step so you could copy/paste...?

Data never sleeps
PaigeMiller
Diamond | Level 26

So is the goal of this programming to produce the endpoint values, or do you really need var1-var9 with values 1 or missing? For a given ID, such as ID=2, do you really need all 5 rows of output? Do you really need all of the var1-var9 if you have the endpoint values?

 

Seems to me that an array ought to work. This gets most of what you want. You can change the 0s to missing if necessary.

 

data have2;
    set source(obs=10);
    by id;
    array v var1-var9;
    if first.id then do;
        endpoint=group_var; 
        output;
    end;
    do i=1 to dim(v);
        if v(i)=1 then do;
            endpoint=i+3;
            output;
        end;
    end;
    drop i;
run;

 

 

 

 

--
Paige Miller

sas-innovate-wordmark-2025-midnight.png

Register Today!

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.


Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 951 views
  • 1 like
  • 3 in conversation