DATA Step, Macro, Functions and more

array processing (?)

Accepted Solution Solved
Reply
Super Contributor
Posts: 287
Accepted Solution

array processing (?)

my data set has the vars listed below (along with a unique student ID). The values can = . , 1, 2, 3 , 4 or 5. For each observation I only want to keep the scores that are ge 3. Next, I want to create a variable named MET_AP, which will be a string of the names of the exams with a value ge 3 separated by a dash (if more than one occurrence).

 

Data I Want:

ID     MET_AP

1      APbio2015 - APchem2015

2      APcalcBC2015 - APlit2015 -  APphys12015

3      APworldhist2015

 

List of variables with values = . , 1, 2, 3, 4, 5

APbio2015
APcalcAB2015
APcalcABsub2015
APcalcBC2015
APchem2015
APengl2015
APlit2015
APenv_scr2015
APphysB2015
APphys_elect2015
APphys_mech2015
APphys12015
APphys22015
APstats2015
APworldhist2015
APmacro2015
APmicro2015
APeurohist2015
APgovt2015
APgovt_US2015
APgeogr2015
APint_engl2015
APhist2015

 


Accepted Solutions
Solution
‎10-25-2017 09:47 AM
Super User
Posts: 6,901

Re: array processing (?)

I have to assume that "get rid of the score" means set it to missing.  After all, you can't get rid of a variable on one observation unless you get rid of the variable for all observations  So that step might look like this:

 

data want;

set have;

array names {*}  list of all variable names to process here;

do _n_=1 to dim(names);

   if names{_n_} = 1 then names{_n_} = .O;

   else if names{_n_} = 2 then names {_n_} = .T;

end;

run;

 

That changes the values to missing, but still preserves the original values as being distinct from one another.  All the original "1" values are saved as the special missing value .O so you can get the average of all the 3+ values, but retain the knowledge of what the missing values used to be.

 

Stringing together names is rarely a good idea.  With that being said, here is how you might modify the DATA step above to do it:

 

data want;

set have;

length _3_plus_names $ 500;

array names {*}  list of all variable names to process here;

do _n_=1 to dim(names);

   if names{_n_} = 1 then names{_n_} = .O;

   else if names{_n_} = 2 then names {_n_} = .T;

   else if names{_n_} >= 3 then _3_plus_names = catx(' - ', _3_plus_names, vname(names{_n_})) ;

end;

run;

 

 

View solution in original post


All Replies
Solution
‎10-25-2017 09:47 AM
Super User
Posts: 6,901

Re: array processing (?)

I have to assume that "get rid of the score" means set it to missing.  After all, you can't get rid of a variable on one observation unless you get rid of the variable for all observations  So that step might look like this:

 

data want;

set have;

array names {*}  list of all variable names to process here;

do _n_=1 to dim(names);

   if names{_n_} = 1 then names{_n_} = .O;

   else if names{_n_} = 2 then names {_n_} = .T;

end;

run;

 

That changes the values to missing, but still preserves the original values as being distinct from one another.  All the original "1" values are saved as the special missing value .O so you can get the average of all the 3+ values, but retain the knowledge of what the missing values used to be.

 

Stringing together names is rarely a good idea.  With that being said, here is how you might modify the DATA step above to do it:

 

data want;

set have;

length _3_plus_names $ 500;

array names {*}  list of all variable names to process here;

do _n_=1 to dim(names);

   if names{_n_} = 1 then names{_n_} = .O;

   else if names{_n_} = 2 then names {_n_} = .T;

   else if names{_n_} >= 3 then _3_plus_names = catx(' - ', _3_plus_names, vname(names{_n_})) ;

end;

run;

 

 

Esteemed Advisor
Posts: 5,614

Re: array processing (?)

Try this:

 

data want;
set have;
array scores APbio2015 -- APhist2015;
do i = 1 to dim(scores);
	if scores{i} >= 3 then MET_AP = catx(" - ", MET_AP, vname(scores{i}));
	end;
keep ID MET_AP;
run;

(untested)

 

PG
PROC Star
Posts: 269

Re: array processing (?)

You can use (as already suggested by others) the CATX function to string the names - but I would much prefer to use CALL CATX, for two reasons:

  1. You get a warning if the output variable is too short to hold the result
  2. It is a bit more efficient, CPU-wise

So, a solution could be something like this:

data want;
  set have;
  array scores(*) APbio2015--APhist2015;
  length MET_AP $300;
  do _N_=1 to dim(scores);
    if scores(_N_)>=3 then
      call catx(' - ',MET_AP,scores(_N_));
    end;
  keep ID MET_AP;
run;  
☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 3 replies
  • 151 views
  • 2 likes
  • 4 in conversation