BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
UniversitySas
Quartz | Level 8

Hello,

 

I have the following data:

data source_unclean;
INFILE DATALINES ;
 input ID VAR & $19. CLASSIFICATION $ ;
 DATALINES;
201  PepsiCo  returndata__irs990pf__supplementaryinformation__grantorcontriapprovedforfuture__recipientbusinessname__businessnameline1
201  111 main street  returndata__irs990pf__supplementaryinformation__grantorcontriapprovedforfuture__recipientusaddress__addressline1
201  Charity  returndata__irs990pf__supplementaryinformation__grantorcontriapprovedforfuture__recipientusaddress__classification
331  CocaCola  returndata__irs990pf__supplementaryinformationgrp__grantorcontriapprovedforfuture__recipientbusinessname__businessnameline1
331  1823 unicorn street  returndata__irs990pf__supplementaryinformationgrp__grantorcontriapprovedforfuture__recipientusaddress__addressline1
331  Charity  returndata__irs990pf__supplementaryinformationgrp__grantorcontriapprovedforfuture__recipientusaddress__classification  
RUN;

I want to clean this so it appears only as:

data source_clean;
INFILE DATALINES ;
 input ID VAR & $19. CLASSIFICATION $ ;
 DATALINES;
201  PepsiCo  grantorcontriapprovedforfuture__recipientbusinessname__businessnameline1
201  111 main street  grantorcontriapprovedforfuture__recipientusaddress__addressline1
201  Charity  grantorcontriapprovedforfuture__recipientusaddress__classification
331  CocaCola  grantorcontriapprovedforfuture__recipientbusinessname__businessnameline1
331  1823 unicorn street grantorcontriapprovedforfuture__recipientusaddress__addressline1
331  Charity  grantorcontriapprovedforfuture__recipientusaddress__classification  
RUN;

So that the block of text "returndata__irs990pf__supplementaryinformation__" or "returndata__irs990pf__supplementaryinformationgrp" is ommitted.

How would I go about doing this?

any help is appreciated!

 

edit: I'm not sure what I am doing incorrectly for the sample code above, but it does not display the input correctly. there should only be three columns, and I want to go from this:

ex1.JPG

 

to this:

ex2.JPG

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
ChrisNZ
Tourmaline | Level 20

Like this?

  do REMOVE="returndata__irs990pf__supplementaryinformation__   "
           ,"returndata__irs990pf__supplementaryinformationgrp__"; 
    if CLASSIFICATION =: trim(REMOVE) then CLASSIFICATION =substr(CLASSIFICATION,length(REMOVE)+1 );
  end;

 

View solution in original post

5 REPLIES 5
ChrisNZ
Tourmaline | Level 20

Like this?

  do REMOVE="returndata__irs990pf__supplementaryinformation__   "
           ,"returndata__irs990pf__supplementaryinformationgrp__"; 
    if CLASSIFICATION =: trim(REMOVE) then CLASSIFICATION =substr(CLASSIFICATION,length(REMOVE)+1 );
  end;

 

UniversitySas
Quartz | Level 8

Hi Chris, thanks so much for your reply. I am not currently at my work station but will be sure to update this post once I can try your code out.

 

Im not sure if you remember me, but you have helped me in many of my posts. Do you have any resources or suggestions in terms of how to improve with specifically Data-cleaning in SAS? I can deal with packages like VBA Excel, Stata and MATLAB (the latter two for analysis only) just fine, but SAS seems so incredibly un-intuitive and complicated for the most trivial tasks (e.g., removing substrings..). The help files/guides also read as jargon most of the time.

 

Sorry if this is considered off-topic.

ChrisNZ
Tourmaline | Level 20

@UniversitySas Sorry to hear about your struggle with SAS.

I hope you find my proposed solution easy to understand.

I can't think of any particular resource. Each case is different. If you need to manipulate strings the most commonly used functions are index() and find(), compress(), strip(), scan() and the translation functions trxxxx().

Have a look here. Some can take parameters that make them quite flexible. 

Of course, regular expressions are much more powerful and flexible, so you might want to learn this arcane syntax. It is a steep learning curve but it will help tremendously if you do a lot of work on strings, and it can be used with other software (even in the SAS editor for search and replace).

My only advice therefore is to stick to it, and as you become more familiar, I hope you'll be able to easily exploit the power of the SAS language. We all started at the bottom of the hill... 🙂

UniversitySas
Quartz | Level 8

Worked perfectly, thank you so much

ballardw
Super User

Currenlty the default length of the classification is going to be 8 characters as you do not assign an appropriate informat to read more.

 

My $.02

data source_unclean;
INFILE DATALINES truncover;
 input ID  VAR & $19. @'grantor' CLASSIFICATION $100. ;
 classification = cats('grantor', CLASSIFICATION);
 DATALINES;
201  PepsiCo  returndata__irs990pf__supplementaryinformation__grantorcontriapprovedforfuture__recipientbusinessname__businessnameline1
201  111 main street  returndata__irs990pf__supplementaryinformation__grantorcontriapprovedforfuture__recipientusaddress__addressline1
201  Charity  returndata__irs990pf__supplementaryinformation__grantorcontriapprovedforfuture__recipientusaddress__classification
331  CocaCola  returndata__irs990pf__supplementaryinformationgrp__grantorcontriapprovedforfuture__recipientbusinessname__businessnameline1
331  1823 unicorn street  returndata__irs990pf__supplementaryinformationgrp__grantorcontriapprovedforfuture__recipientusaddress__addressline1
331  Charity  returndata__irs990pf__supplementaryinformationgrp__grantorcontriapprovedforfuture__recipientusaddress__classification  
RUN;

The @'string' input column modifier tells SAS to read from the position following the string. So the value read is missing the word "grantor" and use the CATS function to prefix the read value.

 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 751 views
  • 2 likes
  • 3 in conversation