data INPUT;
infile DATALINES DLM="|" missover;
DATALINES;
Name|Vidhya
Phone1|123456
Phone2|987655
Name|Shree
Phone1|345678
Name|Jagan
Phone1|5435678
Phone2|5431232
Name|Sharma
Phone1|8934751
;
run;
looking for output
name|phone1|phone2
In general read it in as NAME/VALUE pairs and use PROC TRANSPOSE to convert from multiple rows to multiple columns.
You will want to generate a new variable to identify which blocks of rows in the original are to be grouped together into a new row.
data tall ;
infile DATALINES DLM="|" DSD TRUNCOVER ;
length name $32 value $200 ;
input name value;
if upcase(name)='NAME' then id+1;
DATALINES;
Name|Vidhya
Phone1|123456
Phone2|987655
Name|Shree
Phone1|345678
Name|Jagan
Phone1|5435678
Phone2|5431232
Name|Sharma
Phone1|8934751
;
run;
proc transpose data=tall out=want (drop=_:) ;
by id;
id name;
var value;
run;
Thank you for your reply. I'm intrested in datastep method.
Why should it be in a single data step instead taking advantage of PROC TRANSPOSE?
You can do the same thing in a data step, but it takes a lot more code. You will however get more control over the definition of the individual variables.
Hi @SJN,
You could replace the PROC TRANSPOSE step by the following data step:
data want;
array var $200 name phone1 phone2;
do until(last.id);
set tall(drop=name); /* or rename NAME to _NAME if you want to check its values instead of relying on counter I */
by id;
i=sum(i,1);
var[i]=value;
end;
drop i: v:;
run;
Please note that PROC TRANSPOSE, unlike the data step, would handle the case of more than two phone numbers without any change.
A DATA step approach:
data want;
infile datalines dlm="|" missover end=done;
length name $ 20 phone1 phone2 $ 15;
retain name phone1 phone2;
input identifier $ @;
if identifier = 'Name' then do;
if _n_ > 1 then output;
input name;
phone1=' ';
phone2=' ';
end;
else if identifier = 'Phone1' then input phone1;
else if identifier = 'Phone2' then input phone2;
if done then output;
datalines;
...
;
It's untested. The only piece I'm not sure about is whether END= works in combination with DATALINES but it should otherwise be OK.
Great, so everything is in one data step.
END= does not work with DATALINES (I wasn't sure either), but EOF= can be used instead. The latter requires a statement label, though. Hence, the line "if done then output;" could be replaced by
return;
done: output;
drop identifier;
to make it work.
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.