BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
mh2t
Obsidian | Level 7

I have 2 datasets which look like below:

 

First:

var1  var2  ...  var1000     Y

           n               f         1111

  c       n               f         111

  d       o              g         222

  b       o              g         4444

  b                       h         7777

           p              g         222

  c       q              g         4444

  d        r              h         77

 

Second:

varName   Levels      new_level

  var1                              2nd 

  var1             b               3rd

  var1             c               1st

  var1             d               3rd

  var2                              3rd

  var2             n               4th

  var2             o               2nd

  var2             p                4th

  var2             q                2nd

  var2             r                 1st

   ...

 var1000       f                  2nd

 var1000       g                 2nd

 var1000       h                 1st

 

I want the first dataset to have new levels from the second dataset with the same name and type of the variables in the first dataset like below:

 

want:

var1  var2  ...  var1000      Y

2nd     4th          2nd        1111

1st      4th          2nd         111

3rd      2nd         2nd         222

3rd      2nd         2nd         4444

3rd      3rd          2nd         7777

2nd     4th          2nd         222

1st      2nd         2nd         4444

3rd      1st          1st          77

 

Your help would be greatly appreciated!

1 ACCEPTED SOLUTION

Accepted Solutions
r_behata
Barite | Level 11

One Way :

 

data first;
input var1 $ var2  $ var1000   $   Y;
cards;
a b c 111
c n f 111
d x g 222
;
run;

data second;
input varName $  Levels   $   new_level $;
cards;
var1 a 2nd
var1 c 1st
var1 d 3rd
var2 b 1st
var2 n 3rd
var2 o 2nd
var1000 g 5th
;
run;

data want;
	if _n_=1 then do;

		if 0 then
			set second;
		declare hash h(dataset:'second');
		h.definekey('varName','Levels');
		h.definedata('new_level');
		h.definedone();
	end;
	set first;
	array v[*] var1 var2 var1000;
	

	do i=1 to dim(v);
	 if h.find(key:vname(v[i]),key:v[i]) eq 0 then v[i]=new_level;
	 	else call missing(of v[i]);
	end;
	
	drop i varName Levels new_level;
run;

View solution in original post

3 REPLIES 3
ballardw
Super User

If you mean to update in place then you may have at least one issue that won't allow that: the lengths of the variables involved. If your current value has a length of one, all that would be needed to hold a, b, c etc and try to fit in a value of 1st, 2nd or 3rd then you need to have a length of at least 3.

So before even starting you need to check the lengths of 1000 variables to make sure that the longest value of your "new level" would fit.

 

Your second data set is almost usable to create a custom format with Varname as a format name, levels as the Start and New_level as the label value. Look at Proc format and CNTLIN option. Then you could use the format with the existing "level" to display the text of the new level.

 

But there are some details about "blank" values that may need to be clarified as well such as do any of the variables with "blank" have the blank associated with two, or possibly more, new level values?

r_behata
Barite | Level 11

One Way :

 

data first;
input var1 $ var2  $ var1000   $   Y;
cards;
a b c 111
c n f 111
d x g 222
;
run;

data second;
input varName $  Levels   $   new_level $;
cards;
var1 a 2nd
var1 c 1st
var1 d 3rd
var2 b 1st
var2 n 3rd
var2 o 2nd
var1000 g 5th
;
run;

data want;
	if _n_=1 then do;

		if 0 then
			set second;
		declare hash h(dataset:'second');
		h.definekey('varName','Levels');
		h.definedata('new_level');
		h.definedone();
	end;
	set first;
	array v[*] var1 var2 var1000;
	

	do i=1 to dim(v);
	 if h.find(key:vname(v[i]),key:v[i]) eq 0 then v[i]=new_level;
	 	else call missing(of v[i]);
	end;
	
	drop i varName Levels new_level;
run;
Ksharp
Super User

As Reeza said, proc format also is a proper tool.

 

data first;
input var1 $ var2  $ var1000   $   Y;
cards;
a b c 111
c n f 111
d x g 222
;
run;

data second;
input varName $  Levels   $   new_level $;
cards;
var1 a 2nd
var1 c 1st
var1 d 3rd
var2 b 1st
var2 n 3rd
var2 o 2nd
var1000 g 5th
;
run;

data fmt;
 set second;
 type='C';
 varName=cats(varName,'_');
 rename varName=fmtname Levels=start new_level=label;
run;
proc format cntlin=fmt;
run;

data want;
 set first;
 array x{*} $ var1--var1000;
 array _x{3} $40 _var1 _var2 _var1000;
 do i=1 to dim(x);
  _x{i}=putc(x{i},cats(vname(x{i}),'_'));
 end;
 drop i;
run;

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 626 views
  • 2 likes
  • 4 in conversation