data INPUT;
infile DATALINES DLM="|" missover;
DATALINES;
Name|Vidhya
Phone1|123456
Phone2|987655
Name|Shree
Phone1|345678
Name|Jagan
Phone1|5435678
Phone2|5431232
Name|Sharma
Phone1|8934751
;
run;
looking for output
name|phone1|phone2
In general read it in as NAME/VALUE pairs and use PROC TRANSPOSE to convert from multiple rows to multiple columns.
You will want to generate a new variable to identify which blocks of rows in the original are to be grouped together into a new row.
data tall ;
infile DATALINES DLM="|" DSD TRUNCOVER ;
length name $32 value $200 ;
input name value;
if upcase(name)='NAME' then id+1;
DATALINES;
Name|Vidhya
Phone1|123456
Phone2|987655
Name|Shree
Phone1|345678
Name|Jagan
Phone1|5435678
Phone2|5431232
Name|Sharma
Phone1|8934751
;
run;
proc transpose data=tall out=want (drop=_:) ;
by id;
id name;
var value;
run;
Thank you for your reply. I'm intrested in datastep method.
Why should it be in a single data step instead taking advantage of PROC TRANSPOSE?
You can do the same thing in a data step, but it takes a lot more code. You will however get more control over the definition of the individual variables.
Hi @SJN,
You could replace the PROC TRANSPOSE step by the following data step:
data want;
array var $200 name phone1 phone2;
do until(last.id);
set tall(drop=name); /* or rename NAME to _NAME if you want to check its values instead of relying on counter I */
by id;
i=sum(i,1);
var[i]=value;
end;
drop i: v:;
run;
Please note that PROC TRANSPOSE, unlike the data step, would handle the case of more than two phone numbers without any change.
A DATA step approach:
data want;
infile datalines dlm="|" missover end=done;
length name $ 20 phone1 phone2 $ 15;
retain name phone1 phone2;
input identifier $ @;
if identifier = 'Name' then do;
if _n_ > 1 then output;
input name;
phone1=' ';
phone2=' ';
end;
else if identifier = 'Phone1' then input phone1;
else if identifier = 'Phone2' then input phone2;
if done then output;
datalines;
...
;
It's untested. The only piece I'm not sure about is whether END= works in combination with DATALINES but it should otherwise be OK.
Great, so everything is in one data step.
END= does not work with DATALINES (I wasn't sure either), but EOF= can be used instead. The latter requires a statement label, though. Hence, the line "if done then output;" could be replaced by
return;
done: output;
drop identifier;
to make it work.
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.