Extract part of a string having positional delimiter

Reply
Occasional Contributor
Posts: 15

Extract part of a string having positional delimiter

I have a data set that is formatted as a series of variable names paired with their values all in one row, and I would like to convert this into a SAS data set.  Part of the complexity is that the rows contain different variables.  A sampling of the raw data is below.

 

UNITID=459107,SURVSECT=EF1,PART=B,LINE=8,SLEVEL=1,SEX=2,COUNT=14 
UNITID=459107,SURVSECT=EF1,PART=B,LINE=9,SLEVEL=1,SEX=1,COUNT=3 
UNITID=459107,SURVSECT=EF1,PART=B,LINE=9,SLEVEL=1,SEX=2,COUNT=6 
UNITID=459107,SURVSECT=EF1,PART=C,LINE=51,HS=1,COUNT=35 
UNITID=459107,SURVSECT=EF1,PART=C,LINE=51,HS=2,COUNT=15 
UNITID=459107,SURVSECT=EF1,PART=D,COUNT=1 
UNITID=459107,SURVSECT=EF1,PART=E,FT_PY_COHORT=0,FT_EXCLUSIONS=0,FT_CY_COHORT=0,PT_PY_COHORT=0,PT_EXCLUSIONS=0,PT_CY_COHORT=0 
UNITID=459107,SURVSECT=EF1,PART=F,ST_STAFF_RATIO=10 
UNITID=459107,SURVSECT=EF1,PART=G,LINE=1,ENROLL_EXCLUSIVE=19,ENROLL_SOME=80,NOTENROLL=238,INUS_PPS=19,INUS_NOTPPS=0,INUS_UNKNOWN_STATE=0,OUTSIDEUS=0 
UNITID=459107,SURVSECT=EF1,PART=G,LINE=2,ENROLL_EXCLUSIVE=0,ENROLL_SOME=0,NOTENROLL=3,INUS_PPS=0,INUS_NOTPPS=0,INUS_UNKNOWN_STATE=0,OUTSIDEUS=0 
UNITID=475699,SURVSECT=EF1,PART=A,CIPCODE=52.0000,LINE=16,RACE=5,SEX=1,COUNT=1 
UNITID=475699,SURVSECT=EF1,PART=A,CIPCODE=52.0000,LINE=17,RACE=5,SEX=1,COUNT=3

 

The data sets (and the variables in each set) are determined by the PART variable.  For example, the first line above should be in a data set that looks like:

 

 

UNITID   SURVSECT   PART   LINE   SLEVEL   SEX   COUNT
459107   EF1        B         8        1     2      14

 

 

And the last line above should be in a data set that looks like:

 

 

UNITID   SURVSECT   PART   CIPCODE   LINE   REG   COUNT
475699   EF1        A      52.0000     17   51        3

 

 

Currently, I have the entire raw data brought in using an infile statement:

 

data imp_file4;
	infile ef4 lrecl = 256 truncover;
	input @1 kvp $256.;
	put kvp;
	run;

And I make each of my needed data sets through a bit of brute force using the scan function.

data i.imp_a1;
	set imp_file4 imp_file2 imp_file2s;
		where substr(kvp,33,1) = 'A';
	unitid = input(scan(kvp,2,',='),6.);
	survsect = put(scan(kvp,4,',='),$3.);
	part = put(scan(kvp,6,',='),$1.);
	cip = put(scan(kvp,8,',='),$7.);
	line = input(scan(kvp,10,',='),2.);
	reg = put(scan(kvp,12,',='),$1.)||put(scan(kvp,14,',='),$1.);
	count = input(scan(kvp,16,',='),6.);
	drop kvp;
	run;

The thing is, I figure there has to be an easier way to do this using a do loop that can create all the variable names for me as well, and ideally all seven (in this case) data sets.

Respected Advisor
Posts: 4,742

Re: Extract part of a string having positional delimiter

[ Edited ]
Posted in reply to MFLoGrasso

@MFLoGrasso

Have a looked into named input as this is for reading data in the structure you have.

http://go.documentation.sas.com/?docsetId=lestmtsref&docsetTarget=p1jjrvmzmybdeqn1gt8gj1r0ed0u.htm&d...

 

I haven't added all the variables but here some sample code:

data;
  infile datalines ;
  length unitid 8. SURVSECT $20.;
  input @;
  _infile_=translate(_infile_,' ',',');
  input 
    UNITID= SURVSECT=$ PART=$ LINE= SLEVEL= SEX= COUNT=
    FT_PY_COHORT= FT_EXCLUSIONS= FT_CY_COHORT= PT_PY_COHORT= PT_EXCLUSIONS= PT_CY_COHORT=
    ;
  datalines;
UNITID=459107,SURVSECT=EF1,PART=B,LINE=8,SLEVEL=1,SEX=2,COUNT=14 
UNITID=459107,SURVSECT=EF1,PART=B,LINE=9,SLEVEL=1,SEX=1,COUNT=3 
UNITID=459107,SURVSECT=EF1,PART=B,LINE=9,SLEVEL=1,SEX=2,COUNT=6 
UNITID=459107,SURVSECT=EF1,PART=C,LINE=51,HS=1,COUNT=35 
UNITID=459107,SURVSECT=EF1,PART=C,LINE=51,HS=2,COUNT=15 
UNITID=459107,SURVSECT=EF1,PART=D,COUNT=1 
UNITID=459107,SURVSECT=EF1,PART=E,FT_PY_COHORT=0,FT_EXCLUSIONS=0,FT_CY_COHORT=0,PT_PY_COHORT=0,PT_EXCLUSIONS=0,PT_CY_COHORT=0 
UNITID=459107,SURVSECT=EF1,PART=F,ST_STAFF_RATIO=10 
UNITID=459107,SURVSECT=EF1,PART=G,LINE=1,ENROLL_EXCLUSIVE=19,ENROLL_SOME=80,NOTENROLL=238,INUS_PPS=19,INUS_NOTPPS=0,INUS_UNKNOWN_STATE=0,OUTSIDEUS=0 
UNITID=459107,SURVSECT=EF1,PART=G,LINE=2,ENROLL_EXCLUSIVE=0,ENROLL_SOME=0,NOTENROLL=3,INUS_PPS=0,INUS_NOTPPS=0,INUS_UNKNOWN_STATE=0,OUTSIDEUS=0 
UNITID=475699,SURVSECT=EF1,PART=A,CIPCODE=52.0000,LINE=16,RACE=5,SEX=1,COUNT=1 
UNITID=475699,SURVSECT=EF1,PART=A,CIPCODE=52.0000,LINE=17,RACE=5,SEX=1,COUNT=3
;
run;
PROC Star
Posts: 2,373

Re: Extract part of a string having positional delimiter

[ Edited ]

Almost the same as @Patrick but this one retrieves the PT: variables.


data WANT;
  infile datalines dlm=',';
  length unitid 8. SURVSECT $20.;
  input 
    UNITID= SURVSECT=$ PART=$ LINE= SLEVEL= SEX= COUNT=
    FT_PY_COHORT= FT_EXCLUSIONS= FT_CY_COHORT= PT_PY_COHORT= PT_EXCLUSIONS= PT_CY_COHORT=
    ;
  datalines;
UNITID=459107,SURVSECT=EF1,PART=B,LINE=8,SLEVEL=1,SEX=2,COUNT=14 
UNITID=459107,SURVSECT=EF1,PART=B,LINE=9,SLEVEL=1,SEX=1,COUNT=3 
UNITID=459107,SURVSECT=EF1,PART=B,LINE=9,SLEVEL=1,SEX=2,COUNT=6 
UNITID=459107,SURVSECT=EF1,PART=C,LINE=51,HS=1,COUNT=35 
UNITID=459107,SURVSECT=EF1,PART=C,LINE=51,HS=2,COUNT=15 
UNITID=459107,SURVSECT=EF1,PART=D,COUNT=1 
UNITID=459107,SURVSECT=EF1,PART=E,FT_PY_COHORT=0,FT_EXCLUSIONS=0,FT_CY_COHORT=0,PT_PY_COHORT=0,PT_EXCLUSIONS=0,PT_CY_COHORT=0 
UNITID=459107,SURVSECT=EF1,PART=F,ST_STAFF_RATIO=10 
UNITID=459107,SURVSECT=EF1,PART=G,LINE=1,ENROLL_EXCLUSIVE=19,ENROLL_SOME=80,NOTENROLL=238,INUS_PPS=19,INUS_NOTPPS=0,INUS_UNKNOWN_STATE=0,OUTSIDEUS=0 
UNITID=459107,SURVSECT=EF1,PART=G,LINE=2,ENROLL_EXCLUSIVE=0,ENROLL_SOME=0,NOTENROLL=3,INUS_PPS=0,INUS_NOTPPS=0,INUS_UNKNOWN_STATE=0,OUTSIDEUS=0 
UNITID=475699,SURVSECT=EF1,PART=A,CIPCODE=52.0000,LINE=16,RACE=5,SEX=1,COUNT=1 
UNITID=475699,SURVSECT=EF1,PART=A,CIPCODE=52.0000,LINE=17,RACE=5,SEX=1,COUNT=3
run;

 

Respected Advisor
Posts: 4,742

Re: Extract part of a string having positional delimiter

@ChrisNZ

If not replacing the commas with blanks I'm not reading the data correctly. Is this different in your environment?

Capture.JPG

 

PROC Star
Posts: 2,373

Re: Extract part of a string having positional delimiter

@Patrick Oh no you are right!  Smiley Embarassed

I was focusing on the PT fields which are not read.

I have no clue why they are not.

Respected Advisor
Posts: 4,742

Re: Extract part of a string having positional delimiter

@ChrisNZ wrote:

@Patrick Oh no you are right!  Smiley Embarassed

I was focusing on the PT fields which are not read.

I have no clue why they are not.


 

@ChrisNZ

It looks like the input buffer gets truncated in the code I've posted. No idea why. Assuming that in real life @MFLoGrasso will be reading the data from an external file (not inline data reading using datalines/card) below example demonstrates that this should work.

filename mydata temp;
data _null_;
  file mydata;
  infile datalines;
  input;
  put _infile_;
  datalines;
UNITID=459107,SURVSECT=EF1,PART=B,LINE=8,SLEVEL=1,SEX=2,COUNT=14 
UNITID=459107,SURVSECT=EF1,PART=B,LINE=9,SLEVEL=1,SEX=1,COUNT=3 
UNITID=459107,SURVSECT=EF1,PART=B,LINE=9,SLEVEL=1,SEX=2,COUNT=6 
UNITID=459107,SURVSECT=EF1,PART=C,LINE=51,HS=1,COUNT=35 
UNITID=459107,SURVSECT=EF1,PART=C,LINE=51,HS=2,COUNT=15 
UNITID=459107,SURVSECT=EF1,PART=D,COUNT=1 
UNITID=459107,SURVSECT=EF1,PART=E,FT_PY_COHORT=0,FT_EXCLUSIONS=0,FT_CY_COHORT=0,PT_PY_COHORT=0,PT_EXCLUSIONS=0,PT_CY_COHORT=0 
UNITID=459107,SURVSECT=EF1,PART=F,ST_STAFF_RATIO=10 
UNITID=459107,SURVSECT=EF1,PART=G,LINE=1,ENROLL_EXCLUSIVE=19,ENROLL_SOME=80,NOTENROLL=238,INUS_PPS=19,INUS_NOTPPS=0,INUS_UNKNOWN_STATE=0,OUTSIDEUS=0 
UNITID=459107,SURVSECT=EF1,PART=G,LINE=2,ENROLL_EXCLUSIVE=0,ENROLL_SOME=0,NOTENROLL=3,INUS_PPS=0,INUS_NOTPPS=0,INUS_UNKNOWN_STATE=0,OUTSIDEUS=0 
UNITID=475699,SURVSECT=EF1,PART=A,CIPCODE=52.0000,LINE=16,RACE=5,SEX=1,COUNT=1 
UNITID=475699,SURVSECT=EF1,PART=A,IPCODE=52.0000,LINE=17,RACE=5,SEX=1,COUNT=3
;
run;

data want;
  infile mydata;
  length unitid 8. SURVSECT $20.;
  input @;
  _infile_=translate(_infile_,' ',',');
  input 
    UNITID= SURVSECT=$ PART=$ LINE= SLEVEL= SEX= COUNT=
    FT_PY_COHORT= FT_EXCLUSIONS= FT_CY_COHORT= PT_PY_COHORT= PT_EXCLUSIONS= PT_CY_COHORT=
    ;
run;

Capture.JPG

PROC Star
Posts: 2,373

Re: Extract part of a string having positional delimiter

[ Edited ]
Posted in reply to MFLoGrasso

As for creating several data sets, you have to describe them.

Something like this works:

 

data I.IMP_A1 (keep=UNITID SURVSECT PART LINE SLEVEL SEX COUNT)
     I.IMP_A2 (keep=UNITID SURVSECT PART CIPCOD LINE );
...
  if PART= 'A' then output I.IMP_A1;  
  if PART= 'B' then output I.IMP_A2;

 

Ask a Question
Discussion stats
  • 6 replies
  • 106 views
  • 4 likes
  • 3 in conversation