BookmarkSubscribeRSS Feed
MFLoGrasso
Obsidian | Level 7

I have a data set that is formatted as a series of variable names paired with their values all in one row, and I would like to convert this into a SAS data set.  Part of the complexity is that the rows contain different variables.  A sampling of the raw data is below.

 

UNITID=459107,SURVSECT=EF1,PART=B,LINE=8,SLEVEL=1,SEX=2,COUNT=14 
UNITID=459107,SURVSECT=EF1,PART=B,LINE=9,SLEVEL=1,SEX=1,COUNT=3 
UNITID=459107,SURVSECT=EF1,PART=B,LINE=9,SLEVEL=1,SEX=2,COUNT=6 
UNITID=459107,SURVSECT=EF1,PART=C,LINE=51,HS=1,COUNT=35 
UNITID=459107,SURVSECT=EF1,PART=C,LINE=51,HS=2,COUNT=15 
UNITID=459107,SURVSECT=EF1,PART=D,COUNT=1 
UNITID=459107,SURVSECT=EF1,PART=E,FT_PY_COHORT=0,FT_EXCLUSIONS=0,FT_CY_COHORT=0,PT_PY_COHORT=0,PT_EXCLUSIONS=0,PT_CY_COHORT=0 
UNITID=459107,SURVSECT=EF1,PART=F,ST_STAFF_RATIO=10 
UNITID=459107,SURVSECT=EF1,PART=G,LINE=1,ENROLL_EXCLUSIVE=19,ENROLL_SOME=80,NOTENROLL=238,INUS_PPS=19,INUS_NOTPPS=0,INUS_UNKNOWN_STATE=0,OUTSIDEUS=0 
UNITID=459107,SURVSECT=EF1,PART=G,LINE=2,ENROLL_EXCLUSIVE=0,ENROLL_SOME=0,NOTENROLL=3,INUS_PPS=0,INUS_NOTPPS=0,INUS_UNKNOWN_STATE=0,OUTSIDEUS=0 
UNITID=475699,SURVSECT=EF1,PART=A,CIPCODE=52.0000,LINE=16,RACE=5,SEX=1,COUNT=1 
UNITID=475699,SURVSECT=EF1,PART=A,CIPCODE=52.0000,LINE=17,RACE=5,SEX=1,COUNT=3

 

The data sets (and the variables in each set) are determined by the PART variable.  For example, the first line above should be in a data set that looks like:

 

 

UNITID   SURVSECT   PART   LINE   SLEVEL   SEX   COUNT
459107   EF1        B         8        1     2      14

 

 

And the last line above should be in a data set that looks like:

 

 

UNITID   SURVSECT   PART   CIPCODE   LINE   REG   COUNT
475699   EF1        A      52.0000     17   51        3

 

 

Currently, I have the entire raw data brought in using an infile statement:

 

data imp_file4;
	infile ef4 lrecl = 256 truncover;
	input @1 kvp $256.;
	put kvp;
	run;

And I make each of my needed data sets through a bit of brute force using the scan function.

data i.imp_a1;
	set imp_file4 imp_file2 imp_file2s;
		where substr(kvp,33,1) = 'A';
	unitid = input(scan(kvp,2,',='),6.);
	survsect = put(scan(kvp,4,',='),$3.);
	part = put(scan(kvp,6,',='),$1.);
	cip = put(scan(kvp,8,',='),$7.);
	line = input(scan(kvp,10,',='),2.);
	reg = put(scan(kvp,12,',='),$1.)||put(scan(kvp,14,',='),$1.);
	count = input(scan(kvp,16,',='),6.);
	drop kvp;
	run;

The thing is, I figure there has to be an easier way to do this using a do loop that can create all the variable names for me as well, and ideally all seven (in this case) data sets.

6 REPLIES 6
Patrick
Opal | Level 21

@MFLoGrasso

Have a looked into named input as this is for reading data in the structure you have.

http://go.documentation.sas.com/?docsetId=lestmtsref&docsetTarget=p1jjrvmzmybdeqn1gt8gj1r0ed0u.htm&d...

 

I haven't added all the variables but here some sample code:

data;
  infile datalines ;
  length unitid 8. SURVSECT $20.;
  input @;
  _infile_=translate(_infile_,' ',',');
  input 
    UNITID= SURVSECT=$ PART=$ LINE= SLEVEL= SEX= COUNT=
    FT_PY_COHORT= FT_EXCLUSIONS= FT_CY_COHORT= PT_PY_COHORT= PT_EXCLUSIONS= PT_CY_COHORT=
    ;
  datalines;
UNITID=459107,SURVSECT=EF1,PART=B,LINE=8,SLEVEL=1,SEX=2,COUNT=14 
UNITID=459107,SURVSECT=EF1,PART=B,LINE=9,SLEVEL=1,SEX=1,COUNT=3 
UNITID=459107,SURVSECT=EF1,PART=B,LINE=9,SLEVEL=1,SEX=2,COUNT=6 
UNITID=459107,SURVSECT=EF1,PART=C,LINE=51,HS=1,COUNT=35 
UNITID=459107,SURVSECT=EF1,PART=C,LINE=51,HS=2,COUNT=15 
UNITID=459107,SURVSECT=EF1,PART=D,COUNT=1 
UNITID=459107,SURVSECT=EF1,PART=E,FT_PY_COHORT=0,FT_EXCLUSIONS=0,FT_CY_COHORT=0,PT_PY_COHORT=0,PT_EXCLUSIONS=0,PT_CY_COHORT=0 
UNITID=459107,SURVSECT=EF1,PART=F,ST_STAFF_RATIO=10 
UNITID=459107,SURVSECT=EF1,PART=G,LINE=1,ENROLL_EXCLUSIVE=19,ENROLL_SOME=80,NOTENROLL=238,INUS_PPS=19,INUS_NOTPPS=0,INUS_UNKNOWN_STATE=0,OUTSIDEUS=0 
UNITID=459107,SURVSECT=EF1,PART=G,LINE=2,ENROLL_EXCLUSIVE=0,ENROLL_SOME=0,NOTENROLL=3,INUS_PPS=0,INUS_NOTPPS=0,INUS_UNKNOWN_STATE=0,OUTSIDEUS=0 
UNITID=475699,SURVSECT=EF1,PART=A,CIPCODE=52.0000,LINE=16,RACE=5,SEX=1,COUNT=1 
UNITID=475699,SURVSECT=EF1,PART=A,CIPCODE=52.0000,LINE=17,RACE=5,SEX=1,COUNT=3
;
run;
ChrisNZ
Tourmaline | Level 20

Almost the same as @Patrick but this one retrieves the PT: variables.


data WANT;
  infile datalines dlm=',';
  length unitid 8. SURVSECT $20.;
  input 
    UNITID= SURVSECT=$ PART=$ LINE= SLEVEL= SEX= COUNT=
    FT_PY_COHORT= FT_EXCLUSIONS= FT_CY_COHORT= PT_PY_COHORT= PT_EXCLUSIONS= PT_CY_COHORT=
    ;
  datalines;
UNITID=459107,SURVSECT=EF1,PART=B,LINE=8,SLEVEL=1,SEX=2,COUNT=14 
UNITID=459107,SURVSECT=EF1,PART=B,LINE=9,SLEVEL=1,SEX=1,COUNT=3 
UNITID=459107,SURVSECT=EF1,PART=B,LINE=9,SLEVEL=1,SEX=2,COUNT=6 
UNITID=459107,SURVSECT=EF1,PART=C,LINE=51,HS=1,COUNT=35 
UNITID=459107,SURVSECT=EF1,PART=C,LINE=51,HS=2,COUNT=15 
UNITID=459107,SURVSECT=EF1,PART=D,COUNT=1 
UNITID=459107,SURVSECT=EF1,PART=E,FT_PY_COHORT=0,FT_EXCLUSIONS=0,FT_CY_COHORT=0,PT_PY_COHORT=0,PT_EXCLUSIONS=0,PT_CY_COHORT=0 
UNITID=459107,SURVSECT=EF1,PART=F,ST_STAFF_RATIO=10 
UNITID=459107,SURVSECT=EF1,PART=G,LINE=1,ENROLL_EXCLUSIVE=19,ENROLL_SOME=80,NOTENROLL=238,INUS_PPS=19,INUS_NOTPPS=0,INUS_UNKNOWN_STATE=0,OUTSIDEUS=0 
UNITID=459107,SURVSECT=EF1,PART=G,LINE=2,ENROLL_EXCLUSIVE=0,ENROLL_SOME=0,NOTENROLL=3,INUS_PPS=0,INUS_NOTPPS=0,INUS_UNKNOWN_STATE=0,OUTSIDEUS=0 
UNITID=475699,SURVSECT=EF1,PART=A,CIPCODE=52.0000,LINE=16,RACE=5,SEX=1,COUNT=1 
UNITID=475699,SURVSECT=EF1,PART=A,CIPCODE=52.0000,LINE=17,RACE=5,SEX=1,COUNT=3
run;

 

Patrick
Opal | Level 21

@ChrisNZ

If not replacing the commas with blanks I'm not reading the data correctly. Is this different in your environment?

Capture.JPG

 

ChrisNZ
Tourmaline | Level 20

@Patrick Oh no you are right!  Smiley Embarassed

I was focusing on the PT fields which are not read.

I have no clue why they are not.

Patrick
Opal | Level 21

@ChrisNZ wrote:

@Patrick Oh no you are right!  Smiley Embarassed

I was focusing on the PT fields which are not read.

I have no clue why they are not.


 

@ChrisNZ

It looks like the input buffer gets truncated in the code I've posted. No idea why. Assuming that in real life @MFLoGrasso will be reading the data from an external file (not inline data reading using datalines/card) below example demonstrates that this should work.

filename mydata temp;
data _null_;
  file mydata;
  infile datalines;
  input;
  put _infile_;
  datalines;
UNITID=459107,SURVSECT=EF1,PART=B,LINE=8,SLEVEL=1,SEX=2,COUNT=14 
UNITID=459107,SURVSECT=EF1,PART=B,LINE=9,SLEVEL=1,SEX=1,COUNT=3 
UNITID=459107,SURVSECT=EF1,PART=B,LINE=9,SLEVEL=1,SEX=2,COUNT=6 
UNITID=459107,SURVSECT=EF1,PART=C,LINE=51,HS=1,COUNT=35 
UNITID=459107,SURVSECT=EF1,PART=C,LINE=51,HS=2,COUNT=15 
UNITID=459107,SURVSECT=EF1,PART=D,COUNT=1 
UNITID=459107,SURVSECT=EF1,PART=E,FT_PY_COHORT=0,FT_EXCLUSIONS=0,FT_CY_COHORT=0,PT_PY_COHORT=0,PT_EXCLUSIONS=0,PT_CY_COHORT=0 
UNITID=459107,SURVSECT=EF1,PART=F,ST_STAFF_RATIO=10 
UNITID=459107,SURVSECT=EF1,PART=G,LINE=1,ENROLL_EXCLUSIVE=19,ENROLL_SOME=80,NOTENROLL=238,INUS_PPS=19,INUS_NOTPPS=0,INUS_UNKNOWN_STATE=0,OUTSIDEUS=0 
UNITID=459107,SURVSECT=EF1,PART=G,LINE=2,ENROLL_EXCLUSIVE=0,ENROLL_SOME=0,NOTENROLL=3,INUS_PPS=0,INUS_NOTPPS=0,INUS_UNKNOWN_STATE=0,OUTSIDEUS=0 
UNITID=475699,SURVSECT=EF1,PART=A,CIPCODE=52.0000,LINE=16,RACE=5,SEX=1,COUNT=1 
UNITID=475699,SURVSECT=EF1,PART=A,IPCODE=52.0000,LINE=17,RACE=5,SEX=1,COUNT=3
;
run;

data want;
  infile mydata;
  length unitid 8. SURVSECT $20.;
  input @;
  _infile_=translate(_infile_,' ',',');
  input 
    UNITID= SURVSECT=$ PART=$ LINE= SLEVEL= SEX= COUNT=
    FT_PY_COHORT= FT_EXCLUSIONS= FT_CY_COHORT= PT_PY_COHORT= PT_EXCLUSIONS= PT_CY_COHORT=
    ;
run;

Capture.JPG

ChrisNZ
Tourmaline | Level 20

As for creating several data sets, you have to describe them.

Something like this works:

 

data I.IMP_A1 (keep=UNITID SURVSECT PART LINE SLEVEL SEX COUNT)
     I.IMP_A2 (keep=UNITID SURVSECT PART CIPCOD LINE );
...
  if PART= 'A' then output I.IMP_A1;  
  if PART= 'B' then output I.IMP_A2;

 

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 932 views
  • 4 likes
  • 3 in conversation