BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
DavidKaib
Fluorite | Level 6

I have seen various things that are close to this, but not exactly was I am trying to do. I have a data set where there are anywhere from 1 to 3 records per person. There are some fields that are identical in multiple records (firstname, lastname, gender and id) and others that are not (degree and major). I know which fields are invariant and which ones are different. I am hoping to find code to collapse the records so there is only one per person, where the fields that have multiple values can be put into a string (BA;MA) and the ones that do not remain the same. Is there a way to do this? I'm using SAS base 9.4.

 

So this:

IDFirstNameLastNameGenderDegreeMajor
1234567BillJohnsonMBASociology
2345678JenBakerWBASociology
2345678JenBakerWMAPsychology
4567890StanWrightMMAPsychology
4567890StanWrightMMASociology

 

Would become this:

IDFirstNameLastNameGenderDegreeMajor
1234567BillJohnsonMBASociology
2345678JenBakerWBA;MASociology;Psychology
4567890StanWrightMMA;MAPsychology;Sociology



 

1 ACCEPTED SOLUTION

Accepted Solutions
LeonidBatkhan
Lapis Lazuli | Level 10

Hi DavidKaib,

I believe you can do it using the following code:

proc sort data=A out=A1 (rename=(Degree=D Major=M));
   by ID;
run;

data B (drop=D M);
   set A1;
   by ID;
   length Degree $20 Major $90;
   retain Degree Major;
   if first.ID then do;
      Degree = D;
      Major = M;
   end;
   else do; 
      Degree = catx(';',Degree,D);
      Major  = catx(';',Major, M);
   end;
   if last.ID then output;
run;

Here A is your first data set, B is your destination data set. At least, this gives you an idea.

Hope this helps.

View solution in original post

8 REPLIES 8
ballardw
Super User

Do you have a known maximum number of records that will need to collapse? This is needed so that you can create variables long enough to hold the results.

 

 

And another question, what exactly will you be able to do with the resulting file that can't be done with the existing one?

PaigeMiller
Diamond | Level 26

I would like to ask a similar question to that of @ballardw 

 

What could you possibly do with text fields "BA;MA" or "Sociology;Psychology"? how would that help whatever comes next.

 

Maybe you want a column for each student named BA that is either 1 or 0, and another column for each student named MA which is 1 or 0; and so on. That could be a better form for subsequent analysis or reporting.

--
Paige Miller
DavidKaib
Fluorite | Level 6

The variables won't end up being too long. I am losing a legacy way of generating the data set that did this for me, and that I used within SAS, and this hasn't been a problem. The new source cannot collaspe the records, so I am basically trying to replicate what I have previously had.

 

The resulting file is used as a contact list for a survey, and then later for analysis of the survey. In both cases I need an unduplicated count with all the information.

Reeza
Super User

https://gist.github.com/statgeek/d583cfa992bf56da51d435165b07e96a

This provides two examples. If you're combining more than one field, using the data step approach maybe easier/less steps.

 

 

PaigeMiller
Diamond | Level 26

Length was never the issue, in my mind. Creating a data set that would be easy to use and easy to program in subsequent analysis is the issue, and the data set I described with BA 0 or 1 and MA 0 or 1 and so on seems like an easier data set to work with.

--
Paige Miller
LeonidBatkhan
Lapis Lazuli | Level 10

Hi DavidKaib,

I believe you can do it using the following code:

proc sort data=A out=A1 (rename=(Degree=D Major=M));
   by ID;
run;

data B (drop=D M);
   set A1;
   by ID;
   length Degree $20 Major $90;
   retain Degree Major;
   if first.ID then do;
      Degree = D;
      Major = M;
   end;
   else do; 
      Degree = catx(';',Degree,D);
      Major  = catx(';',Major, M);
   end;
   if last.ID then output;
run;

Here A is your first data set, B is your destination data set. At least, this gives you an idea.

Hope this helps.

DavidKaib
Fluorite | Level 6

Thanks Leonid, this is perfect.

r_behata
Barite | Level 11
data have;
input ID $ FirstName $ LastName $ Gender $ Degree $ Major  $10.;
cards;
1234567 Bill Johnson M BA Sociology 
2345678 Jen Baker W BA Sociology 
2345678 Jen Baker W MA Psychology 
4567890 Stan Wright M MA Psychology 
4567890 Stan Wright M MA Sociology 
;;;;
run;


data want(rename=(Degree_=Degree Major_=Major));
length Major_ $50. Degree_ $25.;
	do until(last.id);
		set have;
		by id;
		Degree_ = catx(';',Degree_,Degree); 
		Major_=catx(';',Major_,major); 
	end;
	drop major Degree;
run;

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 1159 views
  • 2 likes
  • 6 in conversation