BookmarkSubscribeRSS Feed
MFLoGrasso
Obsidian | Level 7

I have a data set that is formatted as a series of variable names paired with their values all in one row, and I would like to convert this into a SAS data set.  Part of the complexity is that the rows contain different variables.  A sampling of the raw data is below.

 

UNITID=459107,SURVSECT=EF1,PART=B,LINE=8,SLEVEL=1,SEX=2,COUNT=14 
UNITID=459107,SURVSECT=EF1,PART=B,LINE=9,SLEVEL=1,SEX=1,COUNT=3 
UNITID=459107,SURVSECT=EF1,PART=B,LINE=9,SLEVEL=1,SEX=2,COUNT=6 
UNITID=459107,SURVSECT=EF1,PART=C,LINE=51,HS=1,COUNT=35 
UNITID=459107,SURVSECT=EF1,PART=C,LINE=51,HS=2,COUNT=15 
UNITID=459107,SURVSECT=EF1,PART=D,COUNT=1 
UNITID=459107,SURVSECT=EF1,PART=E,FT_PY_COHORT=0,FT_EXCLUSIONS=0,FT_CY_COHORT=0,PT_PY_COHORT=0,PT_EXCLUSIONS=0,PT_CY_COHORT=0 
UNITID=459107,SURVSECT=EF1,PART=F,ST_STAFF_RATIO=10 
UNITID=459107,SURVSECT=EF1,PART=G,LINE=1,ENROLL_EXCLUSIVE=19,ENROLL_SOME=80,NOTENROLL=238,INUS_PPS=19,INUS_NOTPPS=0,INUS_UNKNOWN_STATE=0,OUTSIDEUS=0 
UNITID=459107,SURVSECT=EF1,PART=G,LINE=2,ENROLL_EXCLUSIVE=0,ENROLL_SOME=0,NOTENROLL=3,INUS_PPS=0,INUS_NOTPPS=0,INUS_UNKNOWN_STATE=0,OUTSIDEUS=0 
UNITID=475699,SURVSECT=EF1,PART=A,CIPCODE=52.0000,LINE=16,RACE=5,SEX=1,COUNT=1 
UNITID=475699,SURVSECT=EF1,PART=A,CIPCODE=52.0000,LINE=17,RACE=5,SEX=1,COUNT=3

 

The data sets (and the variables in each set) are determined by the PART variable.  For example, the first line above should be in a data set that looks like:

 

 

UNITID   SURVSECT   PART   LINE   SLEVEL   SEX   COUNT
459107   EF1        B         8        1     2      14

 

 

And the last line above should be in a data set that looks like:

 

 

UNITID   SURVSECT   PART   CIPCODE   LINE   REG   COUNT
475699   EF1        A      52.0000     17   51        3

 

 

Currently, I have the entire raw data brought in using an infile statement:

 

data imp_file4;
	infile ef4 lrecl = 256 truncover;
	input @1 kvp $256.;
	put kvp;
	run;

And I make each of my needed data sets through a bit of brute force using the scan function.

data i.imp_a1;
	set imp_file4 imp_file2 imp_file2s;
		where substr(kvp,33,1) = 'A';
	unitid = input(scan(kvp,2,',='),6.);
	survsect = put(scan(kvp,4,',='),$3.);
	part = put(scan(kvp,6,',='),$1.);
	cip = put(scan(kvp,8,',='),$7.);
	line = input(scan(kvp,10,',='),2.);
	reg = put(scan(kvp,12,',='),$1.)||put(scan(kvp,14,',='),$1.);
	count = input(scan(kvp,16,',='),6.);
	drop kvp;
	run;

The thing is, I figure there has to be an easier way to do this using a do loop that can create all the variable names for me as well, and ideally all seven (in this case) data sets.

6 REPLIES 6
Patrick
Opal | Level 21

@MFLoGrasso

Have a looked into named input as this is for reading data in the structure you have.

http://go.documentation.sas.com/?docsetId=lestmtsref&docsetTarget=p1jjrvmzmybdeqn1gt8gj1r0ed0u.htm&d...

 

I haven't added all the variables but here some sample code:

data;
  infile datalines ;
  length unitid 8. SURVSECT $20.;
  input @;
  _infile_=translate(_infile_,' ',',');
  input 
    UNITID= SURVSECT=$ PART=$ LINE= SLEVEL= SEX= COUNT=
    FT_PY_COHORT= FT_EXCLUSIONS= FT_CY_COHORT= PT_PY_COHORT= PT_EXCLUSIONS= PT_CY_COHORT=
    ;
  datalines;
UNITID=459107,SURVSECT=EF1,PART=B,LINE=8,SLEVEL=1,SEX=2,COUNT=14 
UNITID=459107,SURVSECT=EF1,PART=B,LINE=9,SLEVEL=1,SEX=1,COUNT=3 
UNITID=459107,SURVSECT=EF1,PART=B,LINE=9,SLEVEL=1,SEX=2,COUNT=6 
UNITID=459107,SURVSECT=EF1,PART=C,LINE=51,HS=1,COUNT=35 
UNITID=459107,SURVSECT=EF1,PART=C,LINE=51,HS=2,COUNT=15 
UNITID=459107,SURVSECT=EF1,PART=D,COUNT=1 
UNITID=459107,SURVSECT=EF1,PART=E,FT_PY_COHORT=0,FT_EXCLUSIONS=0,FT_CY_COHORT=0,PT_PY_COHORT=0,PT_EXCLUSIONS=0,PT_CY_COHORT=0 
UNITID=459107,SURVSECT=EF1,PART=F,ST_STAFF_RATIO=10 
UNITID=459107,SURVSECT=EF1,PART=G,LINE=1,ENROLL_EXCLUSIVE=19,ENROLL_SOME=80,NOTENROLL=238,INUS_PPS=19,INUS_NOTPPS=0,INUS_UNKNOWN_STATE=0,OUTSIDEUS=0 
UNITID=459107,SURVSECT=EF1,PART=G,LINE=2,ENROLL_EXCLUSIVE=0,ENROLL_SOME=0,NOTENROLL=3,INUS_PPS=0,INUS_NOTPPS=0,INUS_UNKNOWN_STATE=0,OUTSIDEUS=0 
UNITID=475699,SURVSECT=EF1,PART=A,CIPCODE=52.0000,LINE=16,RACE=5,SEX=1,COUNT=1 
UNITID=475699,SURVSECT=EF1,PART=A,CIPCODE=52.0000,LINE=17,RACE=5,SEX=1,COUNT=3
;
run;
ChrisNZ
Tourmaline | Level 20

Almost the same as @Patrick but this one retrieves the PT: variables.


data WANT;
  infile datalines dlm=',';
  length unitid 8. SURVSECT $20.;
  input 
    UNITID= SURVSECT=$ PART=$ LINE= SLEVEL= SEX= COUNT=
    FT_PY_COHORT= FT_EXCLUSIONS= FT_CY_COHORT= PT_PY_COHORT= PT_EXCLUSIONS= PT_CY_COHORT=
    ;
  datalines;
UNITID=459107,SURVSECT=EF1,PART=B,LINE=8,SLEVEL=1,SEX=2,COUNT=14 
UNITID=459107,SURVSECT=EF1,PART=B,LINE=9,SLEVEL=1,SEX=1,COUNT=3 
UNITID=459107,SURVSECT=EF1,PART=B,LINE=9,SLEVEL=1,SEX=2,COUNT=6 
UNITID=459107,SURVSECT=EF1,PART=C,LINE=51,HS=1,COUNT=35 
UNITID=459107,SURVSECT=EF1,PART=C,LINE=51,HS=2,COUNT=15 
UNITID=459107,SURVSECT=EF1,PART=D,COUNT=1 
UNITID=459107,SURVSECT=EF1,PART=E,FT_PY_COHORT=0,FT_EXCLUSIONS=0,FT_CY_COHORT=0,PT_PY_COHORT=0,PT_EXCLUSIONS=0,PT_CY_COHORT=0 
UNITID=459107,SURVSECT=EF1,PART=F,ST_STAFF_RATIO=10 
UNITID=459107,SURVSECT=EF1,PART=G,LINE=1,ENROLL_EXCLUSIVE=19,ENROLL_SOME=80,NOTENROLL=238,INUS_PPS=19,INUS_NOTPPS=0,INUS_UNKNOWN_STATE=0,OUTSIDEUS=0 
UNITID=459107,SURVSECT=EF1,PART=G,LINE=2,ENROLL_EXCLUSIVE=0,ENROLL_SOME=0,NOTENROLL=3,INUS_PPS=0,INUS_NOTPPS=0,INUS_UNKNOWN_STATE=0,OUTSIDEUS=0 
UNITID=475699,SURVSECT=EF1,PART=A,CIPCODE=52.0000,LINE=16,RACE=5,SEX=1,COUNT=1 
UNITID=475699,SURVSECT=EF1,PART=A,CIPCODE=52.0000,LINE=17,RACE=5,SEX=1,COUNT=3
run;

 

Patrick
Opal | Level 21

@ChrisNZ

If not replacing the commas with blanks I'm not reading the data correctly. Is this different in your environment?

Capture.JPG

 

ChrisNZ
Tourmaline | Level 20

@Patrick Oh no you are right!  Smiley Embarassed

I was focusing on the PT fields which are not read.

I have no clue why they are not.

Patrick
Opal | Level 21

@ChrisNZ wrote:

@Patrick Oh no you are right!  Smiley Embarassed

I was focusing on the PT fields which are not read.

I have no clue why they are not.


 

@ChrisNZ

It looks like the input buffer gets truncated in the code I've posted. No idea why. Assuming that in real life @MFLoGrasso will be reading the data from an external file (not inline data reading using datalines/card) below example demonstrates that this should work.

filename mydata temp;
data _null_;
  file mydata;
  infile datalines;
  input;
  put _infile_;
  datalines;
UNITID=459107,SURVSECT=EF1,PART=B,LINE=8,SLEVEL=1,SEX=2,COUNT=14 
UNITID=459107,SURVSECT=EF1,PART=B,LINE=9,SLEVEL=1,SEX=1,COUNT=3 
UNITID=459107,SURVSECT=EF1,PART=B,LINE=9,SLEVEL=1,SEX=2,COUNT=6 
UNITID=459107,SURVSECT=EF1,PART=C,LINE=51,HS=1,COUNT=35 
UNITID=459107,SURVSECT=EF1,PART=C,LINE=51,HS=2,COUNT=15 
UNITID=459107,SURVSECT=EF1,PART=D,COUNT=1 
UNITID=459107,SURVSECT=EF1,PART=E,FT_PY_COHORT=0,FT_EXCLUSIONS=0,FT_CY_COHORT=0,PT_PY_COHORT=0,PT_EXCLUSIONS=0,PT_CY_COHORT=0 
UNITID=459107,SURVSECT=EF1,PART=F,ST_STAFF_RATIO=10 
UNITID=459107,SURVSECT=EF1,PART=G,LINE=1,ENROLL_EXCLUSIVE=19,ENROLL_SOME=80,NOTENROLL=238,INUS_PPS=19,INUS_NOTPPS=0,INUS_UNKNOWN_STATE=0,OUTSIDEUS=0 
UNITID=459107,SURVSECT=EF1,PART=G,LINE=2,ENROLL_EXCLUSIVE=0,ENROLL_SOME=0,NOTENROLL=3,INUS_PPS=0,INUS_NOTPPS=0,INUS_UNKNOWN_STATE=0,OUTSIDEUS=0 
UNITID=475699,SURVSECT=EF1,PART=A,CIPCODE=52.0000,LINE=16,RACE=5,SEX=1,COUNT=1 
UNITID=475699,SURVSECT=EF1,PART=A,IPCODE=52.0000,LINE=17,RACE=5,SEX=1,COUNT=3
;
run;

data want;
  infile mydata;
  length unitid 8. SURVSECT $20.;
  input @;
  _infile_=translate(_infile_,' ',',');
  input 
    UNITID= SURVSECT=$ PART=$ LINE= SLEVEL= SEX= COUNT=
    FT_PY_COHORT= FT_EXCLUSIONS= FT_CY_COHORT= PT_PY_COHORT= PT_EXCLUSIONS= PT_CY_COHORT=
    ;
run;

Capture.JPG

ChrisNZ
Tourmaline | Level 20

As for creating several data sets, you have to describe them.

Something like this works:

 

data I.IMP_A1 (keep=UNITID SURVSECT PART LINE SLEVEL SEX COUNT)
     I.IMP_A2 (keep=UNITID SURVSECT PART CIPCOD LINE );
...
  if PART= 'A' then output I.IMP_A1;  
  if PART= 'B' then output I.IMP_A2;

 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 509 views
  • 4 likes
  • 3 in conversation