BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
GreggB
Pyrite | Level 9

my data set has the vars listed below (along with a unique student ID). The values can = . , 1, 2, 3 , 4 or 5. For each observation I only want to keep the scores that are ge 3. Next, I want to create a variable named MET_AP, which will be a string of the names of the exams with a value ge 3 separated by a dash (if more than one occurrence).

 

Data I Want:

ID     MET_AP

1      APbio2015 - APchem2015

2      APcalcBC2015 - APlit2015 -  APphys12015

3      APworldhist2015

 

List of variables with values = . , 1, 2, 3, 4, 5

APbio2015
APcalcAB2015
APcalcABsub2015
APcalcBC2015
APchem2015
APengl2015
APlit2015
APenv_scr2015
APphysB2015
APphys_elect2015
APphys_mech2015
APphys12015
APphys22015
APstats2015
APworldhist2015
APmacro2015
APmicro2015
APeurohist2015
APgovt2015
APgovt_US2015
APgeogr2015
APint_engl2015
APhist2015

 

1 ACCEPTED SOLUTION

Accepted Solutions
Astounding
PROC Star

I have to assume that "get rid of the score" means set it to missing.  After all, you can't get rid of a variable on one observation unless you get rid of the variable for all observations  So that step might look like this:

 

data want;

set have;

array names {*}  list of all variable names to process here;

do _n_=1 to dim(names);

   if names{_n_} = 1 then names{_n_} = .O;

   else if names{_n_} = 2 then names {_n_} = .T;

end;

run;

 

That changes the values to missing, but still preserves the original values as being distinct from one another.  All the original "1" values are saved as the special missing value .O so you can get the average of all the 3+ values, but retain the knowledge of what the missing values used to be.

 

Stringing together names is rarely a good idea.  With that being said, here is how you might modify the DATA step above to do it:

 

data want;

set have;

length _3_plus_names $ 500;

array names {*}  list of all variable names to process here;

do _n_=1 to dim(names);

   if names{_n_} = 1 then names{_n_} = .O;

   else if names{_n_} = 2 then names {_n_} = .T;

   else if names{_n_} >= 3 then _3_plus_names = catx(' - ', _3_plus_names, vname(names{_n_})) ;

end;

run;

 

 

View solution in original post

3 REPLIES 3
Astounding
PROC Star

I have to assume that "get rid of the score" means set it to missing.  After all, you can't get rid of a variable on one observation unless you get rid of the variable for all observations  So that step might look like this:

 

data want;

set have;

array names {*}  list of all variable names to process here;

do _n_=1 to dim(names);

   if names{_n_} = 1 then names{_n_} = .O;

   else if names{_n_} = 2 then names {_n_} = .T;

end;

run;

 

That changes the values to missing, but still preserves the original values as being distinct from one another.  All the original "1" values are saved as the special missing value .O so you can get the average of all the 3+ values, but retain the knowledge of what the missing values used to be.

 

Stringing together names is rarely a good idea.  With that being said, here is how you might modify the DATA step above to do it:

 

data want;

set have;

length _3_plus_names $ 500;

array names {*}  list of all variable names to process here;

do _n_=1 to dim(names);

   if names{_n_} = 1 then names{_n_} = .O;

   else if names{_n_} = 2 then names {_n_} = .T;

   else if names{_n_} >= 3 then _3_plus_names = catx(' - ', _3_plus_names, vname(names{_n_})) ;

end;

run;

 

 

PGStats
Opal | Level 21

Try this:

 

data want;
set have;
array scores APbio2015 -- APhist2015;
do i = 1 to dim(scores);
	if scores{i} >= 3 then MET_AP = catx(" - ", MET_AP, vname(scores{i}));
	end;
keep ID MET_AP;
run;

(untested)

 

PG
s_lassen
Meteorite | Level 14

You can use (as already suggested by others) the CATX function to string the names - but I would much prefer to use CALL CATX, for two reasons:

  1. You get a warning if the output variable is too short to hold the result
  2. It is a bit more efficient, CPU-wise

So, a solution could be something like this:

data want;
  set have;
  array scores(*) APbio2015--APhist2015;
  length MET_AP $300;
  do _N_=1 to dim(scores);
    if scores(_N_)>=3 then
      call catx(' - ',MET_AP,scores(_N_));
    end;
  keep ID MET_AP;
run;  

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 763 views
  • 2 likes
  • 4 in conversation