array processing (?)

Solved
Super Contributor
Posts: 287

array processing (?)

my data set has the vars listed below (along with a unique student ID). The values can = . , 1, 2, 3 , 4 or 5. For each observation I only want to keep the scores that are ge 3. Next, I want to create a variable named MET_AP, which will be a string of the names of the exams with a value ge 3 separated by a dash (if more than one occurrence).

Data I Want:

ID     MET_AP

1      APbio2015 - APchem2015

2      APcalcBC2015 - APlit2015 -  APphys12015

3      APworldhist2015

List of variables with values = . , 1, 2, 3, 4, 5

 APbio2015 APcalcAB2015 APcalcABsub2015 APcalcBC2015 APchem2015 APengl2015 APlit2015 APenv_scr2015 APphysB2015 APphys_elect2015 APphys_mech2015 APphys12015 APphys22015 APstats2015 APworldhist2015 APmacro2015 APmicro2015 APeurohist2015 APgovt2015 APgovt_US2015 APgeogr2015 APint_engl2015 APhist2015

Accepted Solutions
Solution
‎10-25-2017 09:47 AM
Super User
Posts: 6,901

Re: array processing (?)

I have to assume that "get rid of the score" means set it to missing.  After all, you can't get rid of a variable on one observation unless you get rid of the variable for all observations  So that step might look like this:

data want;

set have;

array names {*}  list of all variable names to process here;

do _n_=1 to dim(names);

if names{_n_} = 1 then names{_n_} = .O;

else if names{_n_} = 2 then names {_n_} = .T;

end;

run;

That changes the values to missing, but still preserves the original values as being distinct from one another.  All the original "1" values are saved as the special missing value .O so you can get the average of all the 3+ values, but retain the knowledge of what the missing values used to be.

Stringing together names is rarely a good idea.  With that being said, here is how you might modify the DATA step above to do it:

data want;

set have;

length _3_plus_names \$ 500;

array names {*}  list of all variable names to process here;

do _n_=1 to dim(names);

if names{_n_} = 1 then names{_n_} = .O;

else if names{_n_} = 2 then names {_n_} = .T;

else if names{_n_} >= 3 then _3_plus_names = catx(' - ', _3_plus_names, vname(names{_n_})) ;

end;

run;

All Replies
Solution
‎10-25-2017 09:47 AM
Super User
Posts: 6,901

Re: array processing (?)

I have to assume that "get rid of the score" means set it to missing.  After all, you can't get rid of a variable on one observation unless you get rid of the variable for all observations  So that step might look like this:

data want;

set have;

array names {*}  list of all variable names to process here;

do _n_=1 to dim(names);

if names{_n_} = 1 then names{_n_} = .O;

else if names{_n_} = 2 then names {_n_} = .T;

end;

run;

That changes the values to missing, but still preserves the original values as being distinct from one another.  All the original "1" values are saved as the special missing value .O so you can get the average of all the 3+ values, but retain the knowledge of what the missing values used to be.

Stringing together names is rarely a good idea.  With that being said, here is how you might modify the DATA step above to do it:

data want;

set have;

length _3_plus_names \$ 500;

array names {*}  list of all variable names to process here;

do _n_=1 to dim(names);

if names{_n_} = 1 then names{_n_} = .O;

else if names{_n_} = 2 then names {_n_} = .T;

else if names{_n_} >= 3 then _3_plus_names = catx(' - ', _3_plus_names, vname(names{_n_})) ;

end;

run;

Posts: 5,614

Re: array processing (?)

Try this:

``````data want;
set have;
array scores APbio2015 -- APhist2015;
do i = 1 to dim(scores);
if scores{i} >= 3 then MET_AP = catx(" - ", MET_AP, vname(scores{i}));
end;
keep ID MET_AP;
run;``````

(untested)

PG
PROC Star
Posts: 269

Re: array processing (?)

You can use (as already suggested by others) the CATX function to string the names - but I would much prefer to use CALL CATX, for two reasons:

1. You get a warning if the output variable is too short to hold the result
2. It is a bit more efficient, CPU-wise

So, a solution could be something like this:

```data want;
set have;
array scores(*) APbio2015--APhist2015;
length MET_AP \$300;
do _N_=1 to dim(scores);
if scores(_N_)>=3 then
call catx(' - ',MET_AP,scores(_N_));
end;
keep ID MET_AP;
run;  ```
☑ This topic is solved.