I have a data set that is formatted as a series of variable names paired with their values all in one row, and I would like to convert this into a SAS data set. Part of the complexity is that the rows contain different variables. A sampling of the raw data is below.
UNITID=459107,SURVSECT=EF1,PART=B,LINE=8,SLEVEL=1,SEX=2,COUNT=14
UNITID=459107,SURVSECT=EF1,PART=B,LINE=9,SLEVEL=1,SEX=1,COUNT=3
UNITID=459107,SURVSECT=EF1,PART=B,LINE=9,SLEVEL=1,SEX=2,COUNT=6
UNITID=459107,SURVSECT=EF1,PART=C,LINE=51,HS=1,COUNT=35
UNITID=459107,SURVSECT=EF1,PART=C,LINE=51,HS=2,COUNT=15
UNITID=459107,SURVSECT=EF1,PART=D,COUNT=1
UNITID=459107,SURVSECT=EF1,PART=E,FT_PY_COHORT=0,FT_EXCLUSIONS=0,FT_CY_COHORT=0,PT_PY_COHORT=0,PT_EXCLUSIONS=0,PT_CY_COHORT=0
UNITID=459107,SURVSECT=EF1,PART=F,ST_STAFF_RATIO=10
UNITID=459107,SURVSECT=EF1,PART=G,LINE=1,ENROLL_EXCLUSIVE=19,ENROLL_SOME=80,NOTENROLL=238,INUS_PPS=19,INUS_NOTPPS=0,INUS_UNKNOWN_STATE=0,OUTSIDEUS=0
UNITID=459107,SURVSECT=EF1,PART=G,LINE=2,ENROLL_EXCLUSIVE=0,ENROLL_SOME=0,NOTENROLL=3,INUS_PPS=0,INUS_NOTPPS=0,INUS_UNKNOWN_STATE=0,OUTSIDEUS=0
UNITID=475699,SURVSECT=EF1,PART=A,CIPCODE=52.0000,LINE=16,RACE=5,SEX=1,COUNT=1
UNITID=475699,SURVSECT=EF1,PART=A,CIPCODE=52.0000,LINE=17,RACE=5,SEX=1,COUNT=3
The data sets (and the variables in each set) are determined by the PART variable. For example, the first line above should be in a data set that looks like:
UNITID SURVSECT PART LINE SLEVEL SEX COUNT
459107 EF1 B 8 1 2 14
And the last line above should be in a data set that looks like:
UNITID SURVSECT PART CIPCODE LINE REG COUNT
475699 EF1 A 52.0000 17 51 3
Currently, I have the entire raw data brought in using an infile statement:
data imp_file4;
infile ef4 lrecl = 256 truncover;
input @1 kvp $256.;
put kvp;
run;
And I make each of my needed data sets through a bit of brute force using the scan function.
data i.imp_a1;
set imp_file4 imp_file2 imp_file2s;
where substr(kvp,33,1) = 'A';
unitid = input(scan(kvp,2,',='),6.);
survsect = put(scan(kvp,4,',='),$3.);
part = put(scan(kvp,6,',='),$1.);
cip = put(scan(kvp,8,',='),$7.);
line = input(scan(kvp,10,',='),2.);
reg = put(scan(kvp,12,',='),$1.)||put(scan(kvp,14,',='),$1.);
count = input(scan(kvp,16,',='),6.);
drop kvp;
run;
The thing is, I figure there has to be an easier way to do this using a do loop that can create all the variable names for me as well, and ideally all seven (in this case) data sets.
... View more