BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Alexxxxxxx
Pyrite | Level 9

Hi,

 

I am trying to split the value in the same cell to different cells but in the same column.

 

For table 1

IDRating_History
1Rating: CCC+Rating Range: Apr-08-2020 to PresentRating: B-Rating Range: Feb-20-2020 to Apr-08-2020
2Rating: B+Rating Range: Dec-13-2019 to PresentRating: NRRating Range: Apr-13-2018 to Dec-13-2019Rating: BB-Rating Range: Aug-02-2017 to Apr-13-2018

 

I expect to get the following table

IDRatingRating_Range
1CCC+Apr-08-2020 to Present
1B-Feb-20-2020 to Apr-08-2020
2B+Dec-13-2019 to Present
2NRApr-13-2018 to Dec-13-2019
2BB-Aug-02-2017 to Apr-13-2018

 

All value in the 'Rating_History' variable is made up of at least one strings like 'Rating: XXXRating Range: XXX-00-0000 to XXX-00-0000(or Present)'. Could you please give me some suggestions about this?

 

Thanks in advance.

run;data table1;
	infile cards truncover;
	input
	ID $1.
	Rating_History $2000.
	;
	cards;
1 Rating: CCC+Rating Range: Apr-08-2020 to PresentRating: B-Rating Range: Feb-20-2020 to Apr-08-2020
2 Rating: B+Rating Range: Dec-13-2019 to PresentRating: NRRating Range: Apr-13-2018 to Dec-13-2019Rating: BB-Rating Range: Aug-02-2017 to Apr-13-2018
	;;;;



1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User

The following code could give you a start.

 

data table1;
	infile cards truncover;
	input
	ID $1.
	Rating_History $2000.
	;
	cards;
1 Rating: CCC+Rating Range: Apr-08-2020 to PresentRating: B-Rating Range: Feb-20-2020 to Apr-08-2020
2 Rating: B+Rating Range: Dec-13-2019 to PresentRating: NRRating Range: Apr-13-2018 to Dec-13-2019Rating: BB-Rating Range: Aug-02-2017 to Apr-13-2018
	;;;;
data want;
 set table1;
 n=countc(rating_history,':');
 do i=2 to n by 2;
  rating=tranwrd(scan(rating_history,i,':'),'Rating Range','');
  rating_range=tranwrd(scan(rating_history,i+1,':'),'Rating','');
  output;
 end;
 keep id rating rating_range;
 run;

View solution in original post

6 REPLIES 6
jimbarbour
Meteorite | Level 14

It can be done, but it's a bit of a trick.  You'd have to parse the text looking for "Rating Range:" and then take the value from the end of "Rating Range:" to the next "Rating Range:" or the end of a line.

 

I'll put some code that parses text below.  This sample code is parsing JSON data, but you could do something like it to solve this task.

 

Jim

 

%LET	Cmnt							=	;
%LET	Width							=	75;

**------------------------------------------------------------------------------**;

FILENAME	In_Data	'C:\Users\jbarbou3\Documents\SAS\Pgm\Training\JSON_Parse\JSON_Data.txt';

**------------------------------------------------------------------------------**;

DATA	WORK.Parsed_Data;
	DROP	_:;

	LENGTH	_JSON_Data	$32767;
	LENGTH	_Segment	$1026;
	LENGTH	_Var_Name	$32;
	LENGTH	Row_Nbr		4;
	LENGTH	Var_Nbr		4;
	LENGTH	Value		$1026;

	RETAIN	_Max_Nbr	0;

	IF	_N_								=	1	THEN
		DO;
			CALL	MISSING(Var_Nbr, _Var_Name);
			DECLARE	HASH	Vars(ORDERED: 'A');
			Vars.DEFINEKEY('_Var_Name');
			Vars.DEFINEDATA('Var_Nbr');
			Vars.DEFINEDONE();
		END;

	IF	_End_of_Data							THEN
		DO;
			CALL	SYMPUTX('Var_Count', PUT(_Max_Nbr, F3.), 'G');
		END;
	
	INFILE	In_Data		LENGTH			=	_JSON_Length
						RECFM			=	V
						END				=	_End_of_Data
						;

	INPUT	@1	_JSON_Data	$VARYING32767.	_JSON_Length;

	**	The first character is a {, so start in position 2.	**;
	_Position							=	2;

	SUBSTR(_JSON_Data, _JSON_Length, 1)	=	',';

	DO	WHILE	(_Position				<	_JSON_Length);
		_Segment_Length					=	INDEXC(SUBSTR(_JSON_Data, _Position), ',');
		_Segment						=	SUBSTR(_JSON_Data, _Position, (_Segment_Length - 1));
		_Var_Name						=	COMPRESS(SUBSTR(_Segment, 1, (INDEXC(_Segment, ':') - 1)), '"');
		Value							=	COMPRESS(SUBSTR(_Segment, (INDEXC(_Segment, ':') + 1)), '"');
		_Position						=	_Position	+	_Segment_Length;
		_RC								=	Vars.FIND();
		IF	_RC							=	0	THEN
			DO;
				CALL	SYMPUTX(CATS('Var_Name', PUT(Var_Nbr, F3.)), _Var_Name, 'G');
				IF	Var_Nbr				>	_Max_Nbr	THEN
					_Max_Nbr			=	Var_Nbr;
			END;
		ELSE
			DO;
				_Max_Nbr				+	1;
				CALL	SYMPUTX(CATS('Var_Name', PUT(_Max_Nbr, F3.)), _Var_Name, 'G');
				Var_Nbr					=	_Max_Nbr;
				_RC						=	Vars.ADD();
				IF	_RC					>	0	THEN
					DO;
						PUTLOG	"&Err2  ";
						PUTLOG	"&Err2  %Format_Dashes(&Width)";
						PUTLOG	"&Err1  |  Unable to add variable to Hash table.  "	_RC=;
						PUTLOG	"&Err2  %Format_Dashes(&Width)";
					END;
			END;
		Row_Nbr							=	_N_;
		OUTPUT;
	END;
RUN;

**------------------------------------------------------------------------------**;

&Cmnt.PROC	PRINT	DATA=WORK.PARSED_DATA;
&Cmnt.RUN;

**------------------------------------------------------------------------------**;

%MACRO	List_Vars(Comma=NO);
	%IF	%QUPCASE(&Comma)				=	YES	%THEN
		%DO;
			%LET	Comma				=	,;
		%END;
	%ELSE
		%DO;
			%LET	Comma				=	;
		%END;

	%LOCAL	i;
	%DO	i								=	1	%TO	&Var_Count;
		%IF	&i							=	&Var_Count	%THEN
			%LET	Comma				=	;
		&&Var_Name&i &Comma
	%END;
%MEND	List_Vars;

&Cmnt%PUT	&Nte1  &=Var_Count %List_Vars(Comma=YES);

**------------------------------------------------------------------------------**;

%MACRO	Equate_Vars;
	%LOCAL	i;
	%DO	i								=	1	%TO	&Var_Count;
		IF	Var_Nbr						=	&i	THEN
			&&Var_Name&i				=	Value;
	%END;
%MEND	Equate_Vars;

&Cmnt%PUT	&Nte1  %QUOTE(%Equate_Vars);

**------------------------------------------------------------------------------**;

DATA	WORK.Final_Data;
	DROP	Row_Nbr;
	DROP	Var_Nbr;
	DROP	Value;
	DROP	_:;

	RETAIN	%List_Vars;

	SET	WORK.PARSED_DATA;
		BY	Row_Nbr;

	%Equate_Vars;

	IF	LAST.Row_Nbr						THEN
		DO;
			OUTPUT;
			CALL	MISSING(%List_Vars(Comma=YES));
		END;
RUN;
novinosrin
Tourmaline | Level 20

Hi @Alexxxxxxx  Assuming I comprehend your pattern well and your sample is truly representative of your real, please try the below-

 


data table1;
	infile cards truncover;
	input
	ID $1.
	Rating_History $2000.
	;
	cards;
1 Rating: CCC+Rating Range: Apr-08-2020 to PresentRating: B-Rating Range: Feb-20-2020 to Apr-08-2020
2 Rating: B+Rating Range: Dec-13-2019 to PresentRating: NRRating Range: Apr-13-2018 to Dec-13-2019Rating: BB-Rating Range: Aug-02-2017 to Apr-13-2018
	;;;;


data want;
 set table1;
 do _pos=find(Rating_History,'Rating:') by 0 while(_pos);
   _r=find(Rating_History,'Rating Range:',_pos);
   _s=_pos+7;
   _l=_r-_pos-7;
  Rating=substr(Rating_History,_s,_l);
  _pos =find(Rating_History,'Rating:', _pos + 1) ;  
  _s=_r+13;
  _l=_pos-_r-13;
  if _l<=0 then _l=999;
  Rating_Range=substr(Rating_History,_s,_l);
  output;
 end;   
 keep ID Rating Rating_Range;
run;

proc print noobs;run;

The SAS System

ID Rating Rating_Range
1 CCC+ Apr-08-2020 to Present
1 B- Feb-20-2020 to Apr-08-2020
2 B+ Dec-13-2019 to Present
2 NR Apr-13-2018 to Dec-13-2019
2 BB- Aug-02-2017 to Apr-13-2018
Alexxxxxxx
Pyrite | Level 9

Hi @novinosrin ,

 

Many thanks for your code, it works well for the most part of my data, except the rows in attachment (it is the original data of the wrong rows),

The result of a wrong row looks like below, could you please give me more suggestions about this?

NOTE: Invalid third argument to function SUBSTR at line 1467 column 16.
SP_Entity_ID=309861 SP_ISIN=ARPERE010103 SP_CUSIP=- SP_IQID=4597539
SP_COMPANY_NAME=Petrobras Argentina S.A.
SP_ECR_History1=Rating: NR
Rating Range: Mar-20-2017 to Present

Rating: B-
Rating Range: Feb-03-2
016 to Mar-20-2017

Rating: CCC-
Rating Range: Jun-18-2014 to Feb-03-2016

Rating: CCC+
Rating R
ange: Dec-20-2013 to Jun-18-2014

Rating: B-
Rating Range: Sep-13-2013 to Dec-20-2013

Rating: B


Rating Range: Apr-05-2013 to Sep-13-2013

Rating: B+
Rating Range: Nov-01-2012 to Apr-05-2013

R
ating: BB-
Rating Range: Dec-17-2010 to Nov-01-2012

Rating: B+
Rating Range: Feb-12-2009 to Dec-1
7-2010

Rating: BB
Rating Range: Apr-23-2007 to Feb-12-2009

Rating: B+
Rating Range: Oct-09-200
6 to Apr-23-2007

Rating: B
Rating Range: Apr-01-2005 to Oct-09-2006

Rating: B-
Rating Range: J
ul-11-2003 to Apr-01-2005

Rating: CCC
Rating Range: Aug-08-2002 to Jul-11-2003

Rating: SD
Rati
ng Range: Aug-02-2002 to Aug-08-2002

Rating: CC
Rating Range: Jun-12-2002 to Aug-02-2002

Rating
: CCC
Rating Range: Apr-18-2002 to Jun-12-2002

Rating: CCC+
Rating Range: Mar-26-2002 to Apr-18-2
002

Rating: CCC-
Rating Range: Mar-25-2002 to Mar-26-2002

Rating: SD
Rating Range: Mar-25-2002
 to Mar-25-2002

Rating: CCC+
Rating Range: Dec-05-2001 to Mar-25-2002

Rating: B
Rating Range:
Nov-01-2001 to Dec-05-2001

Rating: B+
Rating Range: Oct-10-2001 to Nov-01-2001

Rating: BB-
Rat
ing Range: Jul-13-2001 to Oct-10-2001

Rating: BB
Rating Range: Mar-26-2001 to Jul-13-2001

Ratin
g: BB+
Rating Range: Jan-11-2001 to Mar-26-2001

Rating: BBB-
Rating Range: Apr-22-1997 to Jan-11-
2001

Rating: BB
Rating Range: Apr-02-1997 to Apr-22-1997

Rating: BB-
Rating Range: Jan-15-1997
 to Apr-02-1997
SP_ECR_History2=CreditWatch: Watch Neg
CreditWatch Range: Sep-13-2013 to Dec-20-2013

CreditWatch:
Watch Neg
CreditWatch Range: Nov-01-2012 to Apr-05-2013

CreditWatch: Watch Pos
CreditWatch Range:
 Sep-14-2010 to Dec-17-2010

CreditWatch: Watch Neg
CreditWatch Range: Nov-05-2008 to Feb-12-2009


CreditWatch: Watch Neg
CreditWatch Range: Oct-03-2005 to Apr-06-2006

CreditWatch: Watch Pos
Cre
ditWatch Range: Dec-28-2004 to Apr-01-2005

CreditWatch: Watch Neg
CreditWatch Range: Mar-26-2001 t
o Jun-07-2001

CreditWatch: Watch Neg
CreditWatch Range: Mar-19-2001 to Mar-26-2001

CreditWatch:
 Watch Neg
CreditWatch Range: Nov-01-2000 to Jan-11-2001
SP_ECR_History3=Outlook: Stable
Outlook Range: Feb-03-2016 to Mar-20-2017

Outlook: Negative
Outlo
ok Range: Jun-18-2014 to Feb-03-2016

Outlook: Negative
Outlook Range: Dec-20-2013 to Jun-18-2014


Outlook: Negative
Outlook Range: Apr-05-2013 to Sep-13-2013

Outlook: Negative
Outlook Range: Ap
r-24-2012 to Nov-01-2012

Outlook: Stable
Outlook Range: Dec-17-2010 to Apr-24-2012

Outlook: Sta
ble
Outlook Range: Feb-12-2009 to Sep-14-2010

Outlook: Stable
Outlook Range: Apr-23-2007 to Nov-0
5-2008

Outlook: Stable
Outlook Range: Oct-09-2006 to Apr-23-2007

Outlook: Stable
Outlook Range
: Apr-06-2006 to Oct-09-2006

Outlook: Positive
Outlook Range: Apr-01-2005 to Oct-03-2005

Outloo
k: Stable
Outlook Range: Jul-11-2003 to Dec-28-2004

Outlook: Negative
Outlook Range: Aug-08-2002
to Jul-11-2003

Outlook: Negative
Outlook Range: Jun-12-2002 to Aug-02-2002

Outlook: Negative
O
utlook Range: Apr-18-2002 to Jun-12-2002

Outlook: Negative
Outlook Range: Mar-26-2002 to Apr-18-20
02

Outlook: Negative
Outlook Range: Mar-25-2002 to Mar-26-2002

Outlook: Negative
Outlook Range
: Dec-05-2001 to Mar-25-2002

Outlook: Negative
Outlook Range: Nov-01-2001 to Dec-05-2001

Outloo
k: Negative
Outlook Range: Oct-10-2001 to Nov-01-2001

Outlook: Negative
Outlook Range: Jul-13-200
1 to Oct-10-2001

Outlook: Negative
Outlook Range: Jun-07-2001 to Jul-13-2001

Outlook: Stable
O
utlook Range: Jan-11-2001 to Mar-19-2001

Outlook: Negative
Outlook Range: Jul-22-1999 to Nov-01-20
00

Outlook: Stable
Outlook Range: Apr-22-1997 to Jul-22-1999

Outlook: Stable
Outlook Range: Ap
r-02-1997 to Apr-22-1997

Outlook: Stable
Outlook Range: Jan-15-1997 to Apr-02-1997
Company_Name=Petrobras Argentina S.A. Exchange_Ticker=- S_P_Entity_ID=309861
S_P_RatingsDirect__Industry=Corporates Geographic_Locations=Latin America and Caribbean (Primary)
S_P_Entity_Credit_Rating_History=Rating: NR
Rating Range: Mar-20-2017 to Present

Rating: B-
Ratin
g Range: Feb-03-2016 to Mar-20-2017

Rating: CCC-
Rating Range: Jun-18-2014 to Feb-03-2016

Ratin
g: CCC+
Rating Range: Dec-20-2013 to Jun-18-2014

Rating: B-
Rating Range: Sep-13-2013 to Dec-20-2
013

Rating: B
Rating Range: Apr-05-2013 to Sep-13-2013

Rating: B+
Rating Range: Nov-01-2012 to
 Apr-05-2013

Rating: BB-
Rating Range: Dec-17-2010 to Nov-01-2012

Rating: B+
Rating Range: Feb
-12-2009 to Dec-17-2010

Rating: BB
Rating Range: Apr-23-2007 to Feb-12-2009

Rating: B+
Rating
Range: Oct-09-2006 to Apr-23-2007

Rating: B
Rating Range: Apr-01-2005 to Oct-09-2006

Rating: B-

Rating Range: Jul-11-2003 to Apr-01-2005

Rating: CCC
Rating Range: Aug-08-2002 to Jul-11-2003



Rating: SD
Rating Range: Aug-02-2002 to Aug-08-2002

Rating: CC
Rating Range: Jun-12-2002 to Aug-
02-2002

Rating: CCC
Rating Range: Apr-18-2002 to Jun-12-2002

Rating: CCC+
Rating Range: Mar-26
-2002 to Apr-18-2002

Rating: CCC-
Rating Range: Mar-25-2002 to Mar-26-2002

Rating: SD
Rating R
ange: Mar-25-2002 to Mar-25-2002

Rating: CCC+
Rating Range: Dec-05-2001 to Mar-25-2002

Rating:
B
Rating Range: Nov-01-2001 to Dec-05-2001

Rating: B+
Rating Range: Oct-10-2001 to Nov-01-2001



Rating: BB-
Rating Range: Jul-13-2001 to Oct-10-2001

Rating: BB
Rating Range: Mar-26-2001 to Jul
-13-2001

Rating: BB+
Rating Range: Jan-11-2001 to Mar-26-2001

Rating: BBB-
Rating Range: Apr-2
2-1997 to Jan-11-2001

Rating: BB
Rating Range: Apr-02-1997 to Apr-22-1997

Rating: BB-
Rating R
ange: Jan-15-1997 to Apr-02-1997
S_P_Entity_Credit_Rating_Histor1=CreditWatch: Watch Neg
CreditWatch Range: Sep-13-2013 to Dec-20-2013


CreditWatch: Watch Neg
CreditWatch Range: Nov-01-2012 to Apr-05-2013

CreditWatch: Watch Pos
C
reditWatch Range: Sep-14-2010 to Dec-17-2010

CreditWatch: Watch Neg
CreditWatch Range: Nov-05-2008
 to Feb-12-2009

CreditWatch: Watch Neg
CreditWatch Range: Oct-03-2005 to Apr-06-2006

CreditWatc
h: Watch Pos
CreditWatch Range: Dec-28-2004 to Apr-01-2005

CreditWatch: Watch Neg
CreditWatch Ran
ge: Mar-26-2001 to Jun-07-2001

CreditWatch: Watch Neg
CreditWatch Range: Mar-19-2001 to Mar-26-200
1

CreditWatch: Watch Neg
CreditWatch Range: Nov-01-2000 to Jan-11-2001
S_P_Entity_Credit_Rating_Histor2=Outlook: Stable
Outlook Range: Feb-03-2016 to Mar-20-2017

Outlook
: Negative
Outlook Range: Jun-18-2014 to Feb-03-2016

Outlook: Negative
Outlook Range: Dec-20-2013
 to Jun-18-2014

Outlook: Negative
Outlook Range: Apr-05-2013 to Sep-13-2013

Outlook: Negative

Outlook Range: Apr-24-2012 to Nov-01-2012

Outlook: Stable
Outlook Range: Dec-17-2010 to Apr-24-201
2

Outlook: Stable
Outlook Range: Feb-12-2009 to Sep-14-2010

Outlook: Stable
Outlook Range: Apr
-23-2007 to Nov-05-2008

Outlook: Stable
Outlook Range: Oct-09-2006 to Apr-23-2007

Outlook: Stab
le
Outlook Range: Apr-06-2006 to Oct-09-2006

Outlook: Positive
Outlook Range: Apr-01-2005 to Oct-
03-2005

Outlook: Stable
Outlook Range: Jul-11-2003 to Dec-28-2004

Outlook: Negative
Outlook Ra
nge: Aug-08-2002 to Jul-11-2003

Outlook: Negative
Outlook Range: Jun-12-2002 to Aug-02-2002

Out
look: Negative
Outlook Range: Apr-18-2002 to Jun-12-2002

Outlook: Negative
Outlook Range: Mar-26-
2002 to Apr-18-2002

Outlook: Negative
Outlook Range: Mar-25-2002 to Mar-26-2002

Outlook: Negati
ve
Outlook Range: Dec-05-2001 to Mar-25-2002

Outlook: Negative
Outlook Range: Nov-01-2001 to Dec-
05-2001

Outlook: Negative
Outlook Range: Oct-10-2001 to Nov-01-2001

Outlook: Negative
Outlook
Range: Jul-13-2001 to Oct-10-2001

Outlook: Negative
Outlook Range: Jun-07-2001 to Jul-13-2001

O
utlook: Stable
Outlook Range: Jan-11-2001 to Mar-19-2001

Outlook: Negative
Outlook Range: Jul-22-
1999 to Nov-01-2000

Outlook: Stable
Outlook Range: Apr-22-1997 to Jul-22-1999

Outlook: Stable

Outlook Range: Apr-02-1997 to Apr-22-1997

Outlook: Stable
Outlook Range: Jan-15-1997 to Apr-02-199
7 Primary_ISIN=ARPERE010103 CIK=0000900442; 0001449877 Company_CUSIP=- Excel_Company_ID=IQ4597539
Security_Tickers=- _pos=0 _r=1592 _s=1605 _l=999 Rating=BB-
 Rating_Range=Jan-15-1997 to Apr-02-1997
_ERROR_=1 _N_=4614

 

Alexxxxxxx
Pyrite | Level 9

the attachment is the original data of wrong rows

novinosrin
Tourmaline | Level 20

Hi Your source file is not clear. Can you please paste the wrong rows and all possible rows in a more usable form under datalines like you did this-

data table1;
	infile cards truncover;
	input
	ID $1.
	Rating_History $2000.
	;
	cards;
1 Rating: CCC+Rating Range: Apr-08-2020 to PresentRating: B-Rating Range: Feb-20-2020 to Apr-08-2020
2 Rating: B+Rating Range: Dec-13-2019 to PresentRating: NRRating Range: Apr-13-2018 to Dec-13-2019Rating: BB-Rating Range: Aug-02-2017 to Apr-13-2018
	;;;;

 

Ksharp
Super User

The following code could give you a start.

 

data table1;
	infile cards truncover;
	input
	ID $1.
	Rating_History $2000.
	;
	cards;
1 Rating: CCC+Rating Range: Apr-08-2020 to PresentRating: B-Rating Range: Feb-20-2020 to Apr-08-2020
2 Rating: B+Rating Range: Dec-13-2019 to PresentRating: NRRating Range: Apr-13-2018 to Dec-13-2019Rating: BB-Rating Range: Aug-02-2017 to Apr-13-2018
	;;;;
data want;
 set table1;
 n=countc(rating_history,':');
 do i=2 to n by 2;
  rating=tranwrd(scan(rating_history,i,':'),'Rating Range','');
  rating_range=tranwrd(scan(rating_history,i+1,':'),'Rating','');
  output;
 end;
 keep id rating rating_range;
 run;

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 714 views
  • 1 like
  • 4 in conversation