BookmarkSubscribeRSS Feed
hkassam
Calcite | Level 5
I have a dataset that contains the following sample observations. Var1 has ID's separated by dashes. Var2 is date and Var3 is location.

Var1 Var2 Var3
AB6772-CD5885-ABCD6194-XY7199---- 09/14/2010 London
PQ5928-SP352-BZ5752----- 09/16/2010 New York

Each string in Var1 has from 1 to 8 ID's. In this example, there are 4 ID's in the first obs and 3 in the second. The dashes at the end of the string mean that that there are no more ID's for that obs.

I would like to extract each ID and transpose the file as follows:

ID Var2 Var3
AB6772 09/14/2010 London
CD5885 09/14/2010 London
ABCD6194 09/14/2010 London
XY7199 09/14/2010 London
PQ5928 09/16/2010 New York
SP352 09/16/2010 New York
BZ5752 09/16/2010 New York

I am looking for a simple way to achieve this. Any suggestions?
4 REPLIES 4
data_null__
Jade | Level 19
Does this look right?

[pre]
data need;
input Var1 &$64. Var2:mmddyy. Var3 &$32.;
format var2 mmddyy.;
cards;
AB6772-CD5885-ABCD6194-XY7199---- 09/14/2010 London
PQ5928-SP352-BZ5752----- 09/16/2010 New York
;;;;
run;
data need;
length id $16;
set need;
do i = 1 to 8;
id = scan(var1,i,'-','O');
if missing(id) then leave;
output;
end;
drop var1 i;
run;
proc print;
run;
proc contents;
run;
[/pre]
hkassam
Calcite | Level 5
Yes, it works like a charm. I inserted the do loop in my data step reading the original dataset and it did the job.

I did not realize the scan function is able to keep count of each word between the delimiters.

The irony is that I myself created the text string with the dash delimiters in the first place but now want to unravel it for further processing.

Thanks a millon.
Cynthia_sas
SAS Super FREQ
Hi:
There is an alternate solution -- but it requires the following to be true:
1) there is always 1 space between VAR1 (ID), VAR2 and VAR3 and
2) the last piece of ID -ALWAYS- ends in a number

If these are true, then you can parse out ID when you first read the input file, as shown in the code below, using the ANYDIGIT function.

cynthia
[pre]
data readdash;
length var1 $35 var2 8 var3 $32;
** This infile/input combination assumes that there is ALWAYS;
** at least 1 space between each variable in the input file.;
infile datalines dlm=' ' ;
input Var1 &$ Var2:mmddyy10. Var3 &$32.;
format var2 mmddyy10.;

** As long as ID -always- ends in a number,;
** search backwards from the end of VAR1 and find the last digit;
** in the VAR1 variable. With ANYDIGIT, if start is negative, the;
** search starts at the END of the string and goes to the right.;
** So the value of GOTNUM is the position of the digit before the ----.;
lg = length(var1)*-1;
gotnum = anydigit(var1,lg);

** could obs 3 happen in the data??;
if gotnum gt 0 then id = substr(var1,1,gotnum);
else if gotnum = 0 then id = var1;
return;
datalines;
AB6772-CD5885-ABCD6194-XY7199---- 09/14/2010 London
PQ5928-SP352-BZ5752----- 09/16/2010 New York
ZZzzzz-SPeee-YYaaaa----- 09/16/2010 Albuquerque
;;;;
run;

proc print data=readdash;
run;
[/pre]
Peter_C
Rhodochrosite | Level 12
pick up on Cynthia's solution and make the infile parsing work a little harder for you 😉
The DLM= option of the INFILE statement can accept a variable name as well as a constant. Each time an INPUT statement is executed from the file defined by that INFILE, the value of the DLM= variable is re-evaluated to prepare for parsing. So, for reading the ID variables, set the delimiter to dash (-) and then set it to blank to read the remains of the line. Once the whole line is read, loop over the ID variables to release an observation for each non-blank ID :[pre]data reduced( keep= ID Var2 Var3 ) ;
length id i1-i8 $11 Var2 6 Var3 $40 delimiter $1 ;
attrib var2 format= mmddyy10. informat= mmddyy10. ;
infile cards dsd dlm=delimiter column=c ;
delimiter = '-' ;
input i1-i8 @;
delimiter = ' ' ;
input +1 /* pass over the first blank*/
Var2 @ ;
Var3 = substr( _infile_, c ) ; *put rest of line into Var3 ;
array i(8) ;
do c=1 to 8 ;
if i(c) =' ' then continue ;
id = i(c) ;
output ;
end ;
list;cards;
AB6772-CD5885-ABCD6194-XY7199---- 09/14/2010 London
PQ5928-SP352-BZ5752----- 09/16/2010 New York
;[/pre]
great things available in data step INFILE and INPUT parsing

enjoy
peterC

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 4806 views
  • 0 likes
  • 4 in conversation