SAS Programming

crisgugiu · Posted 06-12-2021 05:06 PM

Hi everyone,

I need to manipulate the values in a dataset based on the values in another datasets. Specifically, I have a raw dataset that contains probabilities for whether a subject utilized one of five response categories for 10 items. The second dataset summed the probabilities for each category by item. You will note in the dataset below, some subjects did not use all the response variables available as illustrated by the value zero. I want to set the values in the raw dataset to missing (rather than zero) for any item-response category that was not used. I am not sure how to do this.

How can I read the summary dataset above into a two dimensional array and loop through it to identify the combination of items and categories = 0 (in this case, items 1-9 and category _4) and then reading in the raw data (with variable Item _0 _1 _2 _3 _4) and set the values for subjects where item=1-9 and variable _4 to missing?

Cristian

Tom · Posted 06-14-2021 02:42 PM

This is probably doing the same thing as the HASH but might be easier to explain.

proc sql ;
create table want as 
  select 
    item
  , row
  , case when(max(cat0)=0) then . else cat0 end as cat0   
  , case when(max(cat1)=0) then . else cat1 end as cat1
  , case when(max(cat2)=0) then . else cat2 end as cat2   
  , case when(max(cat3)=0) then . else cat3 end as cat3   
  , case when(max(cat4)=0) then . else cat4 end as cat4
  from have
  group by item
  order by item, row
;
quit;

The only tricky thing is that PROC SQL will automatically re-merge the values of the aggregate function (MAX(cat0) etc) back with the original data for you. If you where doing this in some SQL implementation you would need to calculate the max(cat...) per ITEM in a subquery and join it back to the original dataset yourself.

Data used:

Spoiler

data have;
  input ITEM CAT0-CAT4 ;
  row+1;
  if item ne lag(item) then row=1;
datalines4;
1 0 0 0 1 0
1 0 0 0.5 0 0
1 0 0 0.5 0.5 0
1 0 0 0.5 1 0
1 0 0.5 0.5 0 0
1 0.5 0 0 0 0
1 0.5 0.5 0 0 0
1 0.5 0.5 0.5 0 0
1 1 0 0 0 0
2 0 0 0 1 0
2 0 0 0.5 0 0
2 0 0 0.5 0.5 0
2 0 0 0.5 1 0
2 0 0 1 0 0
2 0 0.5 0.5 0 0
2 0.5 0 0 0 0
2 0.5 0.5 0 0 0
2 0.5 0.5 0.5 0 0
2 1 0 0 0 0
3 0 0 0 1 0
3 0 0 0.5 0 0
3 0 0 0.5 0.5 0
3 0 0 0.5 1 0
3 0 0.5 0 0 0
3 0 0.5 0.5 0 0
3 0.5 0 0 0 0
3 0.5 0.5 0 0 0
3 1 0 0 0 0
4 0 0 0 0.5 0
4 0 0 0 1 0
4 0 0 0.5 0.5 0
4 0 0.5 0 0 0
4 0 0.5 0.5 0 0
4 0 0.5 0.5 0.5 0
4 0 1 0 0 0
4 0.5 0.5 0 0 0
4 1 0 0 0 0
4 1 0.5 0 0 0
5 0 0 0 0.5 0
5 0 0 0 1 0
5 0 0 0.5 0 0
5 0 0 0.5 0.5 0
5 0 0.5 0 0 0
5 0 0.5 0.5 0 0
5 0.5 0.5 0 0 0
5 1 0 0 0 0
5 1 0.5 0 0 0
6 0 0 0 0.5 0
6 0 0 0 1 0
6 0 0 0.5 0.5 0
6 0 0.5 0 0 0
6 0 0.5 0.5 0 0
6 0 0.5 0.5 0.5 0
6 0 1 0 0 0
6 0.5 0.5 0 0 0
6 1 0 0 0 0
6 1 0.5 0 0 0
7 0 0 0 1 0
7 0 0 0.5 0 0
7 0 0 0.5 0.5 0
7 0 0 0.5 1 0
7 0 0.5 0 0 0
7 0 0.5 0.5 0 0
7 0.5 0.5 0 0 0
7 1 0 0 0 0
7 1 0.5 0 0 0
8 0 0 0 0.5 0
8 0 0 0 1 0
8 0 0 0.5 0.5 0
8 0 0.5 0 0 0
8 0 0.5 0.5 0 0
8 0 0.5 0.5 0.5 0
8 0 1 0 0 0
8 0.5 0.5 0 0 0
8 0.5 1 0 0 0
8 1 0 0 0 0
8 1 0.5 0 0 0
9 0 0 0 1 0
9 0 0 0.5 0 0
9 0 0 0.5 0.5 0
9 0 0 0.5 1 0
9 0 0.5 0 0 0
9 0 0.5 0.5 0 0
9 0.5 0 0 0 0
9 0.5 0.5 0 0 0
9 1 0 0 0 0
10 0 0 0 0 0.5
10 0 0 0 0 1
10 0 0 0 0.5 0.5
10 0 0 0.5 0 0
10 0 0 0.5 0.5 0
10 0 0 0.5 0.5 0.5
10 0 0.5 0.5 0 0
10 0.5 0 0 0 0
10 0.5 0.5 0 0 0
10 0.5 0.5 0.5 0 0
10 1 0 0 0 0
;;;;

data have; input ITEM CAT0-CAT4 ; row+1; if item ne lag(item) then row=1; datalines4; 1 0 0 0 1 0 1 0 0 0.5 0 0 1 0 0 0.5 0.5 0 1 0 0 0.5 1 0 1 0 0.5 0.5 0 0 1 0.5 0 0 0 0 1 0.5 0.5 0 0 0 1 0.5 0.5 0.5 0 0 1 1 0 0 0 0 2 0 0 0 1 0 2 0 0 0.5 0 0 2 0 0 0.5 0.5 0 2 0 0 0.5 1 0 2 0 0 1 0 0 2 0 0.5 0.5 0 0 2 0.5 0 0 0 0 2 0.5 0.5 0 0 0 2 0.5 0.5 0.5 0 0 2 1 0 0 0 0 3 0 0 0 1 0 3 0 0 0.5 0 0 3 0 0 0.5 0.5 0 3 0 0 0.5 1 0 3 0 0.5 0 0 0 3 0 0.5 0.5 0 0 3 0.5 0 0 0 0 3 0.5 0.5 0 0 0 3 1 0 0 0 0 4 0 0 0 0.5 0 4 0 0 0 1 0 4 0 0 0.5 0.5 0 4 0 0.5 0 0 0 4 0 0.5 0.5 0 0 4 0 0.5 0.5 0.5 0 4 0 1 0 0 0 4 0.5 0.5 0 0 0 4 1 0 0 0 0 4 1 0.5 0 0 0 5 0 0 0 0.5 0 5 0 0 0 1 0 5 0 0 0.5 0 0 5 0 0 0.5 0.5 0 5 0 0.5 0 0 0 5 0 0.5 0.5 0 0 5 0.5 0.5 0 0 0 5 1 0 0 0 0 5 1 0.5 0 0 0 6 0 0 0 0.5 0 6 0 0 0 1 0 6 0 0 0.5 0.5 0 6 0 0.5 0 0 0 6 0 0.5 0.5 0 0 6 0 0.5 0.5 0.5 0 6 0 1 0 0 0 6 0.5 0.5 0 0 0 6 1 0 0 0 0 6 1 0.5 0 0 0 7 0 0 0 1 0 7 0 0 0.5 0 0 7 0 0 0.5 0.5 0 7 0 0 0.5 1 0 7 0 0.5 0 0 0 7 0 0.5 0.5 0 0 7 0.5 0.5 0 0 0 7 1 0 0 0 0 7 1 0.5 0 0 0 8 0 0 0 0.5 0 8 0 0 0 1 0 8 0 0 0.5 0.5 0 8 0 0.5 0 0 0 8 0 0.5 0.5 0 0 8 0 0.5 0.5 0.5 0 8 0 1 0 0 0 8 0.5 0.5 0 0 0 8 0.5 1 0 0 0 8 1 0 0 0 0 8 1 0.5 0 0 0 9 0 0 0 1 0 9 0 0 0.5 0 0 9 0 0 0.5 0.5 0 9 0 0 0.5 1 0 9 0 0.5 0 0 0 9 0 0.5 0.5 0 0 9 0.5 0 0 0 0 9 0.5 0.5 0 0 0 9 1 0 0 0 0 10 0 0 0 0 0.5 10 0 0 0 0 1 10 0 0 0 0.5 0.5 10 0 0 0.5 0 0 10 0 0 0.5 0.5 0 10 0 0 0.5 0.5 0.5 10 0 0.5 0.5 0 0 10 0.5 0 0 0 0 10 0.5 0.5 0 0 0 10 0.5 0.5 0.5 0 0 10 1 0 0 0 0 ;;;;

To see the results let's just print the first row for each ITEM value.

proc print data=want ;
  where row=1;
run;

And you can see that CAT4 is set to missing for all items except the last one.

Obs    ITEM    row    cat0    cat1    cat2    cat3    cat4

  1      1      1       0       0       0      1.0      .
 10      2      1       0       0       0      1.0      .
 20      3      1       0       0       0      1.0      .
 29      4      1       0       0       0      0.5      .
 39      5      1       0       0       0      0.5      .
 48      6      1       0       0       0      0.5      .
 58      7      1       0       0       0      1.0      .
 67      8      1       0       0       0      0.5      .
 78      9      1       0       0       0      1.0      .
 87     10      1       0       0       0      0.0     0.5

View solution in original post

PaigeMiller · Posted 06-12-2021 05:31 PM

How can I read the summary dataset above into a two dimensional array...

Are you talking about doing this in a SAS data step? Or are you using PROC IML? Or something else? As another issue, neither a two-dimensional array nor PROC IML is necessary to do this, there are much simpler methods, if that's acceptable to you.

--
Paige Miller

crisgugiu · Posted 06-13-2021 03:35 PM

Ah, IML. I miss IML. I love IML. This would be so simple with IML. Unfortunately, I only have access to SAS/BASE and SAS/STAT. I am open to easier ways that 2-dimensional arrays.

Cristian

PaigeMiller · Posted 06-13-2021 04:09 PM

@crisgugiu wrote:

Ah, IML. I miss IML. I love IML. This would be so simple with IML. Unfortunately, I only have access to SAS/BASE and SAS/STAT. I am open to easier ways that 2-dimensional arrays.

An ordinary array in a data step will work.

data want;
     set sum;
     array x _0-_4;
     do i=1 to dim(x);
         if x(i)=0 then call missing(x(i));
     end;
     drop i;
run;

In the future, @crisgugiu, DO NOT post data as screen captures. Post data as SAS data step code, using these instructions no exceptions!

--
Paige Miller

crisgugiu · Posted 06-13-2021 04:41 PM

Hi PaigeMiller,

The values I want to set to missing are in the raw data file, not the summary statistics file. See the mini dataset from my earlier post as an example of what my data file looks like. I am using the summary statistics (which I named Sum) to flag if any response categories in the raw dataset were not utilized.

C.

tarheel13 · Posted 06-12-2021 05:38 PM

Please post your data. Also don't understand your question. Do you want all 0 to be .?

crisgugiu · Posted 06-13-2021 04:17 PM

Which of these buttons can you use for posting a SAS or Excel dataset?

I created the following dataset that resembles my data. It only has 3 items (1, 9, 10). Variables _0 to _4 denote the probability that a survey respondent utilized that response category.

You will note that the last column (_4) takes on the value 0 for items 1 and 9. Basically, none of the respondents utilized option _4, except for item 10. I want to set the value 0 to . (missing) for any response categories that was never utilized.

I can determine whether a response category was utilized using the following code. What I need is a programmatic way to assign the value 0 to missing for any response category that is not utilized (ie, the sum of all the probabilities equals zero).

proc means data=mini noprint;
var _:;
by item;
output out=sum(drop=_TYPE_ _FREQ_) sum=;
run;

Mini dataset

ITEM	_0	_1	_2	_3	_4
1	0	0	0	1	0
1	0	0	0.5	0	0
1	0	0	0.5	0.5	0
1	0	0	0.5	1	0
1	0	0.5	0.5	0	0
1	0.5	0	0	0	0
1	0.5	0.5	0	0	0
1	0.5	0.5	0.5	0	0
9	0	0	0	1	0
9	0	0	0.5	0	0
9	0	0	0.5	0.5	0
9	0	0	0.5	1	0
9	0	0.5	0	0	0
9	0	0.5	0.5	0	0
9	0.5	0	0	0	0
9	0.5	0.5	0	0	0
9	1	0	0	0	0
10	0	0	0	0	0.5
10	0	0	0	0	1
10	0	0	0	0.5	0.5
10	0	0	0.5	0	0
10	0	0	0.5	0.5	0
10	0	0	0.5	0.5	0.5
10	0	0.5	0.5	0	0
10	0.5	0	0	0	0
10	0.5	0.5	0	0	0
10	0.5	0.5	0.5	0	0
10	1	0	0	0	0

crisgugiu · Posted 06-13-2021 05:04 PM

Hi everyone,

I am not sure if this is the proper way to post. The code appears to replicate the mini dataset I created. Thank you for posting the macro. So what I need is a programmatic way to identify that the values for variable _4 are all zeros for every entry for items 1-9 and then to set these values to missing.

Cristian

data MINI2;
  infile datalines dsd truncover;
  input ITEM:BEST12. _0:32. _1:32. _2:32. _3:32. _4:32.;
  format ITEM BEST12.;
datalines4;
1,0,0,0,1,0
1,0,0,0.5,0,0
1,0,0,0.5,0.5,0
1,0,0,0.5,1,0
1,0,0.5,0.5,0,0
1,0.5,0,0,0,0
1,0.5,0.5,0,0,0
1,0.5,0.5,0.5,0,0
1,1,0,0,0,0
2,0,0,0,1,0
2,0,0,0.5,0,0
2,0,0,0.5,0.5,0
2,0,0,0.5,1,0
2,0,0,1,0,0
2,0,0.5,0.5,0,0
2,0.5,0,0,0,0
2,0.5,0.5,0,0,0
2,0.5,0.5,0.5,0,0
2,1,0,0,0,0
3,0,0,0,1,0
3,0,0,0.5,0,0
3,0,0,0.5,0.5,0
3,0,0,0.5,1,0
3,0,0.5,0,0,0
3,0,0.5,0.5,0,0
3,0.5,0,0,0,0
3,0.5,0.5,0,0,0
3,1,0,0,0,0
4,0,0,0,0.5,0
4,0,0,0,1,0
4,0,0,0.5,0.5,0
4,0,0.5,0,0,0
4,0,0.5,0.5,0,0
4,0,0.5,0.5,0.5,0
4,0,1,0,0,0
4,0.5,0.5,0,0,0
4,1,0,0,0,0
4,1,0.5,0,0,0
5,0,0,0,0.5,0
5,0,0,0,1,0
5,0,0,0.5,0,0
5,0,0,0.5,0.5,0
5,0,0.5,0,0,0
5,0,0.5,0.5,0,0
5,0.5,0.5,0,0,0
5,1,0,0,0,0
5,1,0.5,0,0,0
6,0,0,0,0.5,0
6,0,0,0,1,0
6,0,0,0.5,0.5,0
6,0,0.5,0,0,0
6,0,0.5,0.5,0,0
6,0,0.5,0.5,0.5,0
6,0,1,0,0,0
6,0.5,0.5,0,0,0
6,1,0,0,0,0
6,1,0.5,0,0,0
7,0,0,0,1,0
7,0,0,0.5,0,0
7,0,0,0.5,0.5,0
7,0,0,0.5,1,0
7,0,0.5,0,0,0
7,0,0.5,0.5,0,0
7,0.5,0.5,0,0,0
7,1,0,0,0,0
7,1,0.5,0,0,0
8,0,0,0,0.5,0
8,0,0,0,1,0
8,0,0,0.5,0.5,0
8,0,0.5,0,0,0
8,0,0.5,0.5,0,0
8,0,0.5,0.5,0.5,0
8,0,1,0,0,0
8,0.5,0.5,0,0,0
8,0.5,1,0,0,0
8,1,0,0,0,0
8,1,0.5,0,0,0
9,0,0,0,1,0
9,0,0,0.5,0,0
9,0,0,0.5,0.5,0
9,0,0,0.5,1,0
9,0,0.5,0,0,0
9,0,0.5,0.5,0,0
9,0.5,0,0,0,0
9,0.5,0.5,0,0,0
9,1,0,0,0,0
10,0,0,0,0,0.5
10,0,0,0,0,1
10,0,0,0,0.5,0.5
10,0,0,0.5,0,0
10,0,0,0.5,0.5,0
10,0,0,0.5,0.5,0.5
10,0,0.5,0.5,0,0
10,0.5,0,0,0,0
10,0.5,0.5,0,0,0
10,0.5,0.5,0.5,0,0
10,1,0,0,0,0
;;;;

PaigeMiller · Posted 06-13-2021 05:32 PM

This is the proper way to present data. Thank you!

data want;
    set mini2;
    array x _0-_4;
    do i=1 to dim(x);
         if x(i)=0 then call missing(x(i));
    end;
    drop i;
run;

--
Paige Miller

crisgugiu · Posted 06-13-2021 08:59 PM

Thank you for you suggestion, Paige. However, this is not what I am looking for. I am not trying to change all zeros to missing. I need a conditional process where the zeros for a single variable and item are changed to missing. Specifically, if all the cells for a particular item number and variable are zero than I want these entries set to missing. For this dataset, this occurred for items 1-9 and variable _4. You will note that all the cells for _4 for items 1, 2, ...9 are zero. I want to set these to missing. This is why I was using the code below to sum all the probabilities by item and _0 to _4. If the sum is greater than zero, it means that at least one survey respondent selected that response category. However, if the sum is zero, then it means that no respondent selected the response category. This is the condition I want to set to missing. If no one picked a response category for an item, I want to assign the value for that response category to missing.

proc means data=mini2 noprint;
	var _:;
	by item;
	output out=sum(drop=_TYPE_ _FREQ_) sum=/*=/autoname*/;
run;

Best,

Cristian

Patrick · Posted 06-13-2021 10:04 PM

Using your sample data below should work.

data want;

  if _n_=1 then
    do;
      dcl hash h1();
      h1.defineKey('item','_var_n');
      h1.defineDone();
    end;

  do until(done);
    set mini2 end=done;
    array _var_arr {*} _0 - _4;
    do _var_n=1 to dim(_var_arr);
      if _var_arr[_var_n] ne 0 then h1.ref();
    end;
  end;

  done=0;
  do until(done);
    set mini2 end=done;
    do _var_n=1 to dim(_var_arr);
      if h1.check() ne 0 then call missing(_var_arr[_var_n]);
    end;
    output;
  end;

  drop _var_n;

run;

crisgugiu · Posted 06-14-2021 10:42 AM

Patrick,

Your solutions worked brilliantly though I admit to not understand how it worked. Is there any documentation you can point me to so I can become more familiar with how your solution worked.

Thanks,

Cristian

Tom · Posted 06-13-2021 11:23 PM

You are not really explaining what you are talking about.

Let's set some language definitions to make things easier.

A SAS DATASET consists of one or more VARIABLES that occur for zero or more OBSERVATIONS.

In your original photograph the variables were named strangely: ITEM _0 _1 _2 _3 _4 . In your post with actual data it is no better. What is meaning of those variables whose names are essentially numbers?

You talk of a 2D Array (by which you probably meant a 2D matrix.) To store such information in a SAS dataset you would use THREE variables. One for the row index (that looks like ITEM). One for the column index ( you haven't given us a name for that variable, but you have shown that the values of integers from 0 to 4). And one to store the element that the 2D Matrix contains. You also did not give use a name for that variable.

So let's take your and makeup the other two names. Say COLUMN and SCORE for example. Now your 2D Matrix represents the SCORES given to different ITEMS for different "COLUMN"s. (Still not sure how to interpret those 0 to 4 integers).

But in your posting of the actual data you seem to have introduced another variable. In the actual data you multiple observations for the same value of ITEM. Let's add another variable REP (for repetition number) that counts the number of times the same ITEM value appears in the source text lines. In your example data the ITEM values appear between 9 and 11 times.

Here is data step to read your text lines into a SAS dataset.

Spoiler

data have;
  input ITEM @;
  if item ne lag(item) then REP=0;
  REP+1;
  do COLUMN=0 to 4;
    input SCORE @;
    output;
  end;
datalines4;
1 0 0 0 1 0
1 0 0 0.5 0 0
1 0 0 0.5 0.5 0
1 0 0 0.5 1 0
1 0 0.5 0.5 0 0
1 0.5 0 0 0 0
1 0.5 0.5 0 0 0
1 0.5 0.5 0.5 0 0
1 1 0 0 0 0
2 0 0 0 1 0
2 0 0 0.5 0 0
2 0 0 0.5 0.5 0
2 0 0 0.5 1 0
2 0 0 1 0 0
2 0 0.5 0.5 0 0
2 0.5 0 0 0 0
2 0.5 0.5 0 0 0
2 0.5 0.5 0.5 0 0
2 1 0 0 0 0
3 0 0 0 1 0
3 0 0 0.5 0 0
3 0 0 0.5 0.5 0
3 0 0 0.5 1 0
3 0 0.5 0 0 0
3 0 0.5 0.5 0 0
3 0.5 0 0 0 0
3 0.5 0.5 0 0 0
3 1 0 0 0 0
4 0 0 0 0.5 0
4 0 0 0 1 0
4 0 0 0.5 0.5 0
4 0 0.5 0 0 0
4 0 0.5 0.5 0 0
4 0 0.5 0.5 0.5 0
4 0 1 0 0 0
4 0.5 0.5 0 0 0
4 1 0 0 0 0
4 1 0.5 0 0 0
5 0 0 0 0.5 0
5 0 0 0 1 0
5 0 0 0.5 0 0
5 0 0 0.5 0.5 0
5 0 0.5 0 0 0
5 0 0.5 0.5 0 0
5 0.5 0.5 0 0 0
5 1 0 0 0 0
5 1 0.5 0 0 0
6 0 0 0 0.5 0
6 0 0 0 1 0
6 0 0 0.5 0.5 0
6 0 0.5 0 0 0
6 0 0.5 0.5 0 0
6 0 0.5 0.5 0.5 0
6 0 1 0 0 0
6 0.5 0.5 0 0 0
6 1 0 0 0 0
6 1 0.5 0 0 0
7 0 0 0 1 0
7 0 0 0.5 0 0
7 0 0 0.5 0.5 0
7 0 0 0.5 1 0
7 0 0.5 0 0 0
7 0 0.5 0.5 0 0
7 0.5 0.5 0 0 0
7 1 0 0 0 0
7 1 0.5 0 0 0
8 0 0 0 0.5 0
8 0 0 0 1 0
8 0 0 0.5 0.5 0
8 0 0.5 0 0 0
8 0 0.5 0.5 0 0
8 0 0.5 0.5 0.5 0
8 0 1 0 0 0
8 0.5 0.5 0 0 0
8 0.5 1 0 0 0
8 1 0 0 0 0
8 1 0.5 0 0 0
9 0 0 0 1 0
9 0 0 0.5 0 0
9 0 0 0.5 0.5 0
9 0 0 0.5 1 0
9 0 0.5 0 0 0
9 0 0.5 0.5 0 0
9 0.5 0 0 0 0
9 0.5 0.5 0 0 0
9 1 0 0 0 0
10 0 0 0 0 0.5
10 0 0 0 0 1
10 0 0 0 0.5 0.5
10 0 0 0.5 0 0
10 0 0 0.5 0.5 0
10 0 0 0.5 0.5 0.5
10 0 0.5 0.5 0 0
10 0.5 0 0 0 0
10 0.5 0.5 0 0 0
10 0.5 0.5 0.5 0 0
10 1 0 0 0 0
;;;;

data have; input ITEM @; if item ne lag(item) then REP=0; REP+1; do COLUMN=0 to 4; input SCORE @; output; end; datalines4; 1 0 0 0 1 0 1 0 0 0.5 0 0 1 0 0 0.5 0.5 0 1 0 0 0.5 1 0 1 0 0.5 0.5 0 0 1 0.5 0 0 0 0 1 0.5 0.5 0 0 0 1 0.5 0.5 0.5 0 0 1 1 0 0 0 0 2 0 0 0 1 0 2 0 0 0.5 0 0 2 0 0 0.5 0.5 0 2 0 0 0.5 1 0 2 0 0 1 0 0 2 0 0.5 0.5 0 0 2 0.5 0 0 0 0 2 0.5 0.5 0 0 0 2 0.5 0.5 0.5 0 0 2 1 0 0 0 0 3 0 0 0 1 0 3 0 0 0.5 0 0 3 0 0 0.5 0.5 0 3 0 0 0.5 1 0 3 0 0.5 0 0 0 3 0 0.5 0.5 0 0 3 0.5 0 0 0 0 3 0.5 0.5 0 0 0 3 1 0 0 0 0 4 0 0 0 0.5 0 4 0 0 0 1 0 4 0 0 0.5 0.5 0 4 0 0.5 0 0 0 4 0 0.5 0.5 0 0 4 0 0.5 0.5 0.5 0 4 0 1 0 0 0 4 0.5 0.5 0 0 0 4 1 0 0 0 0 4 1 0.5 0 0 0 5 0 0 0 0.5 0 5 0 0 0 1 0 5 0 0 0.5 0 0 5 0 0 0.5 0.5 0 5 0 0.5 0 0 0 5 0 0.5 0.5 0 0 5 0.5 0.5 0 0 0 5 1 0 0 0 0 5 1 0.5 0 0 0 6 0 0 0 0.5 0 6 0 0 0 1 0 6 0 0 0.5 0.5 0 6 0 0.5 0 0 0 6 0 0.5 0.5 0 0 6 0 0.5 0.5 0.5 0 6 0 1 0 0 0 6 0.5 0.5 0 0 0 6 1 0 0 0 0 6 1 0.5 0 0 0 7 0 0 0 1 0 7 0 0 0.5 0 0 7 0 0 0.5 0.5 0 7 0 0 0.5 1 0 7 0 0.5 0 0 0 7 0 0.5 0.5 0 0 7 0.5 0.5 0 0 0 7 1 0 0 0 0 7 1 0.5 0 0 0 8 0 0 0 0.5 0 8 0 0 0 1 0 8 0 0 0.5 0.5 0 8 0 0.5 0 0 0 8 0 0.5 0.5 0 0 8 0 0.5 0.5 0.5 0 8 0 1 0 0 0 8 0.5 0.5 0 0 0 8 0.5 1 0 0 0 8 1 0 0 0 0 8 1 0.5 0 0 0 9 0 0 0 1 0 9 0 0 0.5 0 0 9 0 0 0.5 0.5 0 9 0 0 0.5 1 0 9 0 0.5 0 0 0 9 0 0.5 0.5 0 0 9 0.5 0 0 0 0 9 0.5 0.5 0 0 0 9 1 0 0 0 0 10 0 0 0 0 0.5 10 0 0 0 0 1 10 0 0 0 0.5 0.5 10 0 0 0.5 0 0 10 0 0 0.5 0.5 0 10 0 0 0.5 0.5 0.5 10 0 0.5 0.5 0 0 10 0.5 0 0 0 0 10 0.5 0.5 0 0 0 10 0.5 0.5 0.5 0 0 10 1 0 0 0 0 ;;;;

Here is what the first 10 observations look like (the first two rows of your text data). So they represent 2 repetitions of ITEM # 1 across the 5 columns and the SCORE given for each ITEM*REP*COLUMN.

Obs    ITEM    REP    COLUMN    SCORE

  1      1      1        0       0.0
  2      1      1        1       0.0
  3      1      1        2       0.0
  4      1      1        3       1.0
  5      1      1        4       0.0
  6      1      2        0       0.0
  7      1      2        1       0.0
  8      1      2        2       0.5
  9      1      2        3       0.0
 10      1      2        4       0.0

So what are these zeros that you are looking for?

What is a "cell"?

What is the meaning of ITEM, REP, COLUMN and SCORE? Can you give those variables more descriptive names for what they mean for your data? Is ITEM a new product being market tested? Is REP a different rater? Is COLUMN different trait of the product to be rated and SCORE the rating of that item by that rating for that trait?

crisgugiu · Posted 06-14-2021 10:57 AM

Tom,

Apologies for the confusion. Let me add more detail. I am working on a Rasch analysis for which I am examining how subjects responded to 10 items. Each items was scores on a scale 0-4, but the labels associated with each response category were different across the 10 items. Hence, I need to analyze each item separately. I am importing the category probabilities from another software (Winsteps) which labeled the 5-point scales as _0, _1, _2, _3, _4, and _5. The dataset Winsteps generated is in long format. There is a variable I omitted for simplicity which is a measure of the latent attribute. Basically, the probabilities tell us how likely a person of a certain degree on the latent variable would be to endorse response _0, _1,...,_4 for each of the 10 items. A zero means that given a subjects level on the latent chance, there is zero chance that they would endorse (select) that response option for a particular question.

For the purposes of my analysis, I am treating the columns (labeled _0, _1,...,_4) as variables. What I needed was a way to check whether a response option was endorsed (selected) by at least one subject for every single item (identified in the column labeled ITEM). If all the values for _0, _1,..., or _4 for a particular item (1, 2,...,10) are zero, then I wanted to set those values to missing. The reason for this is I am plotting these category probabilities and I don't want to include a flat line (all zeros) if the response category was never selected for an item.

Patrick's solution is what I am looking for. However, I have not seen that type of SAS coding before and am trying to understand how it works so that I can macro it for the program I am writing.

I hope this explanation helped.

Cristian

Tom · Posted 06-14-2021 12:10 PM

In your description of the problem you seem to jump back and forth between talking about the observation level and this aggregate level.

For the purposes of my analysis, I am treating the columns (labeled _0, _1,...,_4) as variables. What I needed was a way to check whether a response option was endorsed (selected) by at least one subject for every single item (identified in the column labeled ITEM). If all the values for _0, _1,..., or _4 for a particular item (1, 2,...,10) are zero, then I wanted to set those values to missing. The reason for this is I am plotting these category probabilities and I don't want to include a flat line (all zeros) if the response category was never selected for an item.

In the second sentence you are asking to make a test across all observations for a variable, not just test a single observation of the variable. Sounds like you want an indicator of whether every observation for a variable is zero (that is was it ever non-zero).

But in the next sentence it is not clear what level you are talking about. Are saying you want to eliminate the original observations where all of the columns are zero? Are you saying you want to eliminate all observations for that level of ITEM? If the later when do you want to eliminate all observations for that ITEM? Is it just when every observations for the ITEM has no-non zero values in any of the columns? If so then just eliminating the observations with zeros for all of the columns will also eliminate all of the observations for that ITEM. Perhaps you are saying you don't want to eliminate the observations with all zero values UNLESS you are eliminating the whole ITEM ?

If the later then a simple combination of the SAS SUM(,) function and the SQL aggregate SUM() function might do the trick.

proc sql;
create table subset as
select * 
from have
group by ITEM
having 0 = sum(sum(_0,_1,_2,_3,_4))
;
quit;

Note if you have both positive and negative values of _0 to _4 then you cannot use SUM(,)/SUM() as you could eliminate some cases where the positives exactly cancel the negatives.

SAS Programming

Reading data into two dimensional array

Re: Reading data into two dimensional array

Re: Reading data into two dimensional array

Re: Reading data into two dimensional array

Re: Reading data into two dimensional array

Re: Reading data into two dimensional array

Re: Reading data into two dimensional array

Re: Reading data into two dimensional array

Re: Reading data into two dimensional array

Re: Reading data into two dimensional array

Re: Reading data into two dimensional array

Re: Reading data into two dimensional array

Re: Reading data into two dimensional array

Re: Reading data into two dimensional array

Re: Reading data into two dimensional array

Re: Reading data into two dimensional array

Visualizing High Dimensional Data Using t-SNE

How to output two dimensional array?

Data Dimensionality and Curse of Dimensionality

[SAS 고급] 1차원 배열 (One-Dimensional Arrays)

Multi-dimensional macros

Follow Us

What is...

SAS Programming

Special offer for SAS Communities members

SAS Training: Just a Click Away

Follow Us

What is...