Solved: array processing (?)

GreggB · Posted 10-24-2017 05:02 PM

my data set has the vars listed below (along with a unique student ID). The values can = . , 1, 2, 3 , 4 or 5. For each observation I only want to keep the scores that are ge 3. Next, I want to create a variable named MET_AP, which will be a string of the names of the exams with a value ge 3 separated by a dash (if more than one occurrence).

Data I Want:

ID MET_AP

1 APbio2015 - APchem2015

2 APcalcBC2015 - APlit2015 - APphys12015

3 APworldhist2015

List of variables with values = . , 1, 2, 3, 4, 5

APbio2015

APcalcAB2015

APcalcABsub2015

APcalcBC2015

APchem2015

APengl2015

APlit2015

APenv_scr2015

APphysB2015

APphys_elect2015

APphys_mech2015

APphys12015

APphys22015

APstats2015

APworldhist2015

APmacro2015

APmicro2015

APeurohist2015

APgovt2015

APgovt_US2015

APgeogr2015

APint_engl2015

APhist2015

Astounding · Posted 10-24-2017 07:22 PM

I have to assume that "get rid of the score" means set it to missing. After all, you can't get rid of a variable on one observation unless you get rid of the variable for all observations So that step might look like this:

data want;

set have;

array names {*} list of all variable names to process here;

do _n_=1 to dim(names);

if names{_n_} = 1 then names{_n_} = .O;

else if names{_n_} = 2 then names {_n_} = .T;

end;

run;

That changes the values to missing, but still preserves the original values as being distinct from one another. All the original "1" values are saved as the special missing value .O so you can get the average of all the 3+ values, but retain the knowledge of what the missing values used to be.

Stringing together names is rarely a good idea. With that being said, here is how you might modify the DATA step above to do it:

data want;

set have;

length _3_plus_names $ 500;

array names {*} list of all variable names to process here;

do _n_=1 to dim(names);

if names{_n_} = 1 then names{_n_} = .O;

else if names{_n_} = 2 then names {_n_} = .T;

else if names{_n_} >= 3 then _3_plus_names = catx(' - ', _3_plus_names, vname(names{_n_})) ;

end;

run;

View solution in original post

Astounding · Posted 10-24-2017 07:22 PM

I have to assume that "get rid of the score" means set it to missing. After all, you can't get rid of a variable on one observation unless you get rid of the variable for all observations So that step might look like this:

data want;

set have;

array names {*} list of all variable names to process here;

do _n_=1 to dim(names);

if names{_n_} = 1 then names{_n_} = .O;

else if names{_n_} = 2 then names {_n_} = .T;

end;

run;

That changes the values to missing, but still preserves the original values as being distinct from one another. All the original "1" values are saved as the special missing value .O so you can get the average of all the 3+ values, but retain the knowledge of what the missing values used to be.

Stringing together names is rarely a good idea. With that being said, here is how you might modify the DATA step above to do it:

data want;

set have;

length _3_plus_names $ 500;

array names {*} list of all variable names to process here;

do _n_=1 to dim(names);

if names{_n_} = 1 then names{_n_} = .O;

else if names{_n_} = 2 then names {_n_} = .T;

else if names{_n_} >= 3 then _3_plus_names = catx(' - ', _3_plus_names, vname(names{_n_})) ;

end;

run;

PGStats · Posted 10-24-2017 11:17 PM

Try this:

data want;
set have;
array scores APbio2015 -- APhist2015;
do i = 1 to dim(scores);
	if scores{i} >= 3 then MET_AP = catx(" - ", MET_AP, vname(scores{i}));
	end;
keep ID MET_AP;
run;

(untested)

PG

s_lassen · Posted 10-25-2017 06:05 AM

You can use (as already suggested by others) the CATX function to string the names - but I would much prefer to use CALL CATX, for two reasons:

You get a warning if the output variable is too short to hold the result
It is a bit more efficient, CPU-wise

So, a solution could be something like this:

data want;
  set have;
  array scores(*) APbio2015--APhist2015;
  length MET_AP $300;
  do _N_=1 to dim(scores);
    if scores(_N_)>=3 then
      call catx(' - ',MET_AP,scores(_N_));
    end;
  keep ID MET_AP;
run;

array processing (?)

Re: array processing (?)

Re: array processing (?)

Re: array processing (?)

Re: array processing (?)

Catch up on SAS Innovate 2026

array processing (?)

Re: array processing (?)

Re: array processing (?)

Re: array processing (?)

Re: array processing (?)

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away