BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Saba1
Quartz | Level 8

Hi 

 

I have a dataset1 where columns are basically managers' names attached to various dates. This is my master/main dataset. The other dataset2 has a gender column of these managers, so one column is names of these managers and parallel column is the gender (converted in numeric i.e. 0 and 1).

 

I want to create a "New_dataset" which should be similar to Dataset1, but the names (character variable) must be replaced with the gender variable from Dataset2. Please see a sample below:

 

Data Dataset1;
infile datalines
dlm=","
missover
DSD;
input ID : $10.
	Date_Month : mmddyy10.
	Manager1 : $60.
	Manager2 : $60.
	Manager3 : $60. ;
format Date_Month mmddyy10. ;
datalines;
AB00046,06-30-2016,Ronald Baron,Bill F. Baron,
AB00046,07-31-2016,Ronald Baron,Bill F. Baron,
AB00046,08-31-2016,Ronald Baron,Bill F. Baron,
AB00046,09-30-2016,Ronald Baron,Bill F. Baron,
AB00046,10-31-2016,Ronald Baron,Bill F. Baron,Tim S.
AB00046,11-30-2016,Ronald Baron,Bill F. Baron,Tim S.
AB00046,12-31-2016,Ronald Baron,Bill F. Baron,Tim S.
AB00046,01-31-2017,Ronald Baron,Bill F. Baron,Tim S.
AB00046,02-28-2017,Ronald Baron,Bill F. Baron,Tim S.
AB00046,03-31-2017,Ronald Baron,Bill F. Baron,Tim S.
AB00046,04-30-2017,Ronald Baron,Bill F. Baron,Tim S.
AB00046,05-31-2017,Ronald Baron,Bill F. Baron,Tim S.
AB00046,06-30-2017,Ronald Baron,Bill F. Baron,Tim S.
AB00046,07-31-2017,Ronald Baron,Bill F. Baron,
AB00046,08-31-2017,Ronald Baron,Bill F. Baron,
AB00050,04-30-2016,
AB00050,05-31-2016,
AB00050,06-30-2016,Sharon
AB00050,07-31-2016,Sharon,Tim S.
AB00050,08-31-2016,Sharon,Tim S.
AB00050,09-30-2016,Sharon,Tim S. 
; run;

 

Data Dataset2;
infile datalines
dlm=","
missover
DSD;
input ID : $10.
	Name_Manager : $60.
	Gender ;
datalines;
AB00046,Ronald Baron,0
AB00046,Bill F. Baron,0
AB00046,Tim S.,0
AB00050,Sharon,1
AB00050,Tim S.,0
;
run;
Data Dataset_Want;
infile datalines
dlm=","
missover
DSD;
input ID : $10.
	Date_Month : mmddyy10.
	Manager1
	Manager2
	Manager3 ;
format Date_Month mmddyy10. ;
datalines;
AB00046,06-30-2016,0,0,
AB00046,07-31-2016,0,0,
AB00046,08-31-2016,0,0,
AB00046,09-30-2016,0,0,
AB00046,10-31-2016,0,0,0
AB00046,11-30-2016,0,0,0
AB00046,12-31-2016,0,0,0
AB00046,01-31-2017,0,0,0
AB00046,02-28-2017,0,0,0
AB00046,03-31-2017,0,0,0
AB00046,04-30-2017,0,0,0
AB00046,05-31-2017,0,0,0
AB00046,06-30-2017,0,0,0
AB00046,07-31-2017,0,0,
AB00046,08-31-2017,0,0,
AB00050,04-30-2016,
AB00050,05-31-2016,
AB00050,06-30-2016,1
AB00050,07-31-2016,1,0
AB00050,08-31-2016,1,0
AB00050,09-30-2016,1,0
;
run;

 Please Guide me in this regard. Thanks.

1 ACCEPTED SOLUTION

Accepted Solutions
Patrick
Opal | Level 21

@Saba1

Here a hash approach which also looks for a matching ID column and not only names.

data want(drop=_:);
  set Dataset1;
  array Managers {*} Manager:;

  if _n_=1 then 
    do;
      if 0 then set dataset2(keep=Name_Manager);
      drop Name_Manager Gender;
      dcl hash h1(dataset:'dataset2');
      h1.defineKey('id','Name_Manager');
      h1.defineData('Gender');
      h1.defineDone();
    end;

  do _i=1 to dim(managers);
    if h1.find(key:id, key:managers[_i]) = 0 then managers[_i]=put(Gender,f1.);
    else call missing(managers[_i]);
  end;
run;

View solution in original post

11 REPLIES 11
Saba1
Quartz | Level 8
@Ksharp: your help will be really appreciated.
Kurt_Bremser
Super User

Create a format from dataset2, and apply it:

data cntlin;
set dataset2 end=eof;
fmtname = 'mygender';
type = 'C';
start = name_manager;
label = put(gender,best.);
output;
if eof
then do;
  name_manager = 'other';
  hlo = 'O';
  label = '';
  output;
end;
drop id name_manager gender;
run;

proc format cntlin=cntlin;
run;

data want;
set dataset1;
array manager {*} manager:;
do i = 1 to dim(manager);
  manager {i} = put(manager{i},$mygender.);
end;
drop i;
run;
Saba1
Quartz | Level 8
@Kurt_Bremser : Thanks for your reply. But when I run "proc format cntlin=cntlin;" statement, the Error says: "ERROR: For format $MYGENDER, this range is repeated, or values overlap". Therefore, for "Want" dataset when I run data step, the Error is as follows: " The format $MYGENDER was not found or could not be loaded".
Actually, several managers' names are repeated in dataset2. So there can be cases with same manager name but different ID or in few cases even with the same ID the name is repeated twice. Is there a way to solve this issue?
andreas_lds
Jade | Level 19

@Saba1 wrote:
@Kurt_Bremser : Thanks for your reply. But when I run "proc format cntlin=cntlin;" statement, the Error says: "ERROR: For format $MYGENDER, this range is repeated, or values overlap". Therefore, for "Want" dataset when I run data step, the Error is as follows: " The format $MYGENDER was not found or could not be loaded".
Actually, several managers' names are repeated in dataset2. So there can be cases with same manager name but different ID or in few cases even with the same ID the name is repeated twice. Is there a way to solve this issue?

 

Please post test-data that matches your real data. With the data posted creating the format does not throw any error message.

Saba1
Quartz | Level 8

@andreas_ldsPlease see the updated data sample. Thanks

Kurt_Bremser
Super User

@Saba1 wrote:
@Kurt_Bremser : Thanks for your reply. But when I run "proc format cntlin=cntlin;" statement, the Error says: "ERROR: For format $MYGENDER, this range is repeated, or values overlap". Therefore, for "Want" dataset when I run data step, the Error is as follows: " The format $MYGENDER was not found or could not be loaded".
Actually, several managers' names are repeated in dataset2. So there can be cases with same manager name but different ID or in few cases even with the same ID the name is repeated twice. Is there a way to solve this issue?

With the data presented, the code works. Please post complete example data that illustrates your issues sufficiently.

Saba1
Quartz | Level 8
@Kurt_Bremser: I have updated the sample data. You can see now. Thanks
Kurt_Bremser
Super User

Ok, so we need to expand the format solution.

One way would be to combine the two values that determine the outcome into a single variable and create the format for this.

Or we create multiple formats for the ID's, which I have done here, and use the putc() instead of the put() function (putc() and putn() accept expressions as formats, while put() only accepts a literal format name):

Data Dataset1;
infile datalines
dlm=","
missover
DSD;
input ID : $10.
	Date_Month : mmddyy10.
	Manager1 : $60.
	Manager2 : $60.
	Manager3 : $60. ;
format Date_Month mmddyy10. ;
datalines;
AB00046,06-30-2016,Ronald Baron,Bill F. Baron,
AB00046,07-31-2016,Ronald Baron,Bill F. Baron,
AB00046,08-31-2016,Ronald Baron,Bill F. Baron,
AB00046,09-30-2016,Ronald Baron,Bill F. Baron,
AB00046,10-31-2016,Ronald Baron,Bill F. Baron,Tim S.
AB00046,11-30-2016,Ronald Baron,Bill F. Baron,Tim S.
AB00046,12-31-2016,Ronald Baron,Bill F. Baron,Tim S.
AB00046,01-31-2017,Ronald Baron,Bill F. Baron,Tim S.
AB00046,02-28-2017,Ronald Baron,Bill F. Baron,Tim S.
AB00046,03-31-2017,Ronald Baron,Bill F. Baron,Tim S.
AB00046,04-30-2017,Ronald Baron,Bill F. Baron,Tim S.
AB00046,05-31-2017,Ronald Baron,Bill F. Baron,Tim S.
AB00046,06-30-2017,Ronald Baron,Bill F. Baron,Tim S.
AB00046,07-31-2017,Ronald Baron,Bill F. Baron,
AB00046,08-31-2017,Ronald Baron,Bill F. Baron,
AB00050,04-30-2016,
AB00050,05-31-2016,
AB00050,06-30-2016,Sharon
AB00050,07-31-2016,Sharon,Tim S.
AB00050,08-31-2016,Sharon,Tim S.
AB00050,09-30-2016,Sharon,Tim S. 
;
run;

Data Dataset2;
infile datalines
dlm=","
missover
DSD;
input ID : $10.
	Name_Manager : $60.
	Gender ;
datalines;
AB00046,Ronald Baron,0
AB00046,Bill F. Baron,0
AB00046,Tim S.,0
AB00050,Sharon,1
AB00050,Tim S.,0
;
run;


proc sort data=dataset2;
by id;
run;

data cntlin;
set dataset2;
by id;
fmtname = strip(id) !! '_';
type = 'C';
start = name_manager;
label = put(gender,best.);
output;
if last.id
then do;
  name_manager = 'other';
  hlo = 'O';
  label = '';
  output;
end;
drop id name_manager gender;
run;

proc format cntlin=cntlin;
run;

data want;
set dataset1;
array manager {*} manager:;
do i = 1 to dim(manager);
  manager {i} = putc(manager{i},strip(id) !! '_');
end;
drop i;
run;

The appended underline is necessary because your ID values end with numbers. Similarly, you would neet to prepend an underline or a character if ID started with a digit.

PeterClemmensen
Tourmaline | Level 20

There are many way to solve a problem like this. Here is a PROC FORMAT Approach

 

data fmt;
   set Dataset2(rename=(Name_Manager=start Gender=label)) end=eof;
   retain fmtname "MngFmt" type "c";
   output;
   if eof then do;
      start=""; Label="";
      HLO="O"; output;
   end;
run;

proc format library=work cntlin=fmt;
run;

data want;
   set Dataset1;
   Manager1=input(put(Manager1, $MngFmt.), 8.);
   Manager2=input(put(Manager2, $MngFmt.), 8.);
   Manager3=input(put(Manager3, $MngFmt.), 8.);
run;
Saba1
Quartz | Level 8
@PeterClemmensen: Thanks but in my dataset2 managers' names are repeating so when I run "proc format" statement, I receive the following Error: "ERROR: For format $MNGFMT, this range is repeated, or values overlap:". Also is there a way to replace all the statements like "Manager1=input(put(Manager1, $MngFmt.), 8.);" with something else, because I have around 100 columns in Dataset1 i.e. Manager1 - Manager100.
Patrick
Opal | Level 21

@Saba1

Here a hash approach which also looks for a matching ID column and not only names.

data want(drop=_:);
  set Dataset1;
  array Managers {*} Manager:;

  if _n_=1 then 
    do;
      if 0 then set dataset2(keep=Name_Manager);
      drop Name_Manager Gender;
      dcl hash h1(dataset:'dataset2');
      h1.defineKey('id','Name_Manager');
      h1.defineData('Gender');
      h1.defineDone();
    end;

  do _i=1 to dim(managers);
    if h1.find(key:id, key:managers[_i]) = 0 then managers[_i]=put(Gender,f1.);
    else call missing(managers[_i]);
  end;
run;

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 11 replies
  • 3149 views
  • 4 likes
  • 5 in conversation