03-21-2016 11:30 AM - edited 03-21-2016 11:54 AM
data INPUT; infile DATALINES DLM="|" missover; DATALINES; Name|Vidhya Phone1|123456 Phone2|987655 Name|Shree Phone1|345678 Name|Jagan Phone1|5435678 Phone2|5431232 Name|Sharma Phone1|8934751 ; run;
looking for output
03-21-2016 12:10 PM - edited 03-21-2016 12:11 PM
In general read it in as NAME/VALUE pairs and use PROC TRANSPOSE to convert from multiple rows to multiple columns.
You will want to generate a new variable to identify which blocks of rows in the original are to be grouped together into a new row.
data tall ; infile DATALINES DLM="|" DSD TRUNCOVER ; length name $32 value $200 ; input name value; if upcase(name)='NAME' then id+1; DATALINES; Name|Vidhya Phone1|123456 Phone2|987655 Name|Shree Phone1|345678 Name|Jagan Phone1|5435678 Phone2|5431232 Name|Sharma Phone1|8934751 ; run; proc transpose data=tall out=want (drop=_:) ; by id; id name; var value; run;
03-21-2016 12:17 PM
Why should it be in a single data step instead taking advantage of PROC TRANSPOSE?
You can do the same thing in a data step, but it takes a lot more code. You will however get more control over the definition of the individual variables.
03-21-2016 12:37 PM - edited 03-21-2016 12:42 PM
You could replace the PROC TRANSPOSE step by the following data step:
data want; array var $200 name phone1 phone2; do until(last.id); set tall(drop=name); /* or rename NAME to _NAME if you want to check its values instead of relying on counter I */ by id; i=sum(i,1); var[i]=value; end; drop i: v:; run;
Please note that PROC TRANSPOSE, unlike the data step, would handle the case of more than two phone numbers without any change.
03-21-2016 12:55 PM - edited 03-21-2016 12:56 PM
A DATA step approach:
infile datalines dlm="|" missover end=done;
length name $ 20 phone1 phone2 $ 15;
retain name phone1 phone2;
input identifier $ @;
if identifier = 'Name' then do;
if _n_ > 1 then output;
else if identifier = 'Phone1' then input phone1;
else if identifier = 'Phone2' then input phone2;
if done then output;
It's untested. The only piece I'm not sure about is whether END= works in combination with DATALINES but it should otherwise be OK.
03-21-2016 01:58 PM
Great, so everything is in one data step.
END= does not work with DATALINES (I wasn't sure either), but EOF= can be used instead. The latter requires a statement label, though. Hence, the line "if done then output;" could be replaced by
return; done: output; drop identifier;
to make it work.