Hello,
I have the following data:
data source_unclean;
INFILE DATALINES ;
input ID VAR & $19. CLASSIFICATION $ ;
DATALINES;
201 PepsiCo returndata__irs990pf__supplementaryinformation__grantorcontriapprovedforfuture__recipientbusinessname__businessnameline1
201 111 main street returndata__irs990pf__supplementaryinformation__grantorcontriapprovedforfuture__recipientusaddress__addressline1
201 Charity returndata__irs990pf__supplementaryinformation__grantorcontriapprovedforfuture__recipientusaddress__classification
331 CocaCola returndata__irs990pf__supplementaryinformationgrp__grantorcontriapprovedforfuture__recipientbusinessname__businessnameline1
331 1823 unicorn street returndata__irs990pf__supplementaryinformationgrp__grantorcontriapprovedforfuture__recipientusaddress__addressline1
331 Charity returndata__irs990pf__supplementaryinformationgrp__grantorcontriapprovedforfuture__recipientusaddress__classification
RUN;
I want to clean this so it appears only as:
data source_clean;
INFILE DATALINES ;
input ID VAR & $19. CLASSIFICATION $ ;
DATALINES;
201 PepsiCo grantorcontriapprovedforfuture__recipientbusinessname__businessnameline1
201 111 main street grantorcontriapprovedforfuture__recipientusaddress__addressline1
201 Charity grantorcontriapprovedforfuture__recipientusaddress__classification
331 CocaCola grantorcontriapprovedforfuture__recipientbusinessname__businessnameline1
331 1823 unicorn street grantorcontriapprovedforfuture__recipientusaddress__addressline1
331 Charity grantorcontriapprovedforfuture__recipientusaddress__classification
RUN;
So that the block of text "returndata__irs990pf__supplementaryinformation__" or "returndata__irs990pf__supplementaryinformationgrp" is ommitted.
How would I go about doing this?
any help is appreciated!
edit: I'm not sure what I am doing incorrectly for the sample code above, but it does not display the input correctly. there should only be three columns, and I want to go from this:
to this:
Like this?
do REMOVE="returndata__irs990pf__supplementaryinformation__ "
,"returndata__irs990pf__supplementaryinformationgrp__";
if CLASSIFICATION =: trim(REMOVE) then CLASSIFICATION =substr(CLASSIFICATION,length(REMOVE)+1 );
end;
Like this?
do REMOVE="returndata__irs990pf__supplementaryinformation__ "
,"returndata__irs990pf__supplementaryinformationgrp__";
if CLASSIFICATION =: trim(REMOVE) then CLASSIFICATION =substr(CLASSIFICATION,length(REMOVE)+1 );
end;
Hi Chris, thanks so much for your reply. I am not currently at my work station but will be sure to update this post once I can try your code out.
Im not sure if you remember me, but you have helped me in many of my posts. Do you have any resources or suggestions in terms of how to improve with specifically Data-cleaning in SAS? I can deal with packages like VBA Excel, Stata and MATLAB (the latter two for analysis only) just fine, but SAS seems so incredibly un-intuitive and complicated for the most trivial tasks (e.g., removing substrings..). The help files/guides also read as jargon most of the time.
Sorry if this is considered off-topic.
@UniversitySas Sorry to hear about your struggle with SAS.
I hope you find my proposed solution easy to understand.
I can't think of any particular resource. Each case is different. If you need to manipulate strings the most commonly used functions are index() and find(), compress(), strip(), scan() and the translation functions trxxxx().
Have a look here. Some can take parameters that make them quite flexible.
Of course, regular expressions are much more powerful and flexible, so you might want to learn this arcane syntax. It is a steep learning curve but it will help tremendously if you do a lot of work on strings, and it can be used with other software (even in the SAS editor for search and replace).
My only advice therefore is to stick to it, and as you become more familiar, I hope you'll be able to easily exploit the power of the SAS language. We all started at the bottom of the hill... 🙂
Worked perfectly, thank you so much
Currenlty the default length of the classification is going to be 8 characters as you do not assign an appropriate informat to read more.
My $.02
data source_unclean; INFILE DATALINES truncover; input ID VAR & $19. @'grantor' CLASSIFICATION $100. ; classification = cats('grantor', CLASSIFICATION); DATALINES; 201 PepsiCo returndata__irs990pf__supplementaryinformation__grantorcontriapprovedforfuture__recipientbusinessname__businessnameline1 201 111 main street returndata__irs990pf__supplementaryinformation__grantorcontriapprovedforfuture__recipientusaddress__addressline1 201 Charity returndata__irs990pf__supplementaryinformation__grantorcontriapprovedforfuture__recipientusaddress__classification 331 CocaCola returndata__irs990pf__supplementaryinformationgrp__grantorcontriapprovedforfuture__recipientbusinessname__businessnameline1 331 1823 unicorn street returndata__irs990pf__supplementaryinformationgrp__grantorcontriapprovedforfuture__recipientusaddress__addressline1 331 Charity returndata__irs990pf__supplementaryinformationgrp__grantorcontriapprovedforfuture__recipientusaddress__classification RUN;
The @'string' input column modifier tells SAS to read from the position following the string. So the value read is missing the word "grantor" and use the CATS function to prefix the read value.
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Select SAS Training centers are offering in-person courses. View upcoming courses for: