Hi!
Sorry if this was discussed but I couldn't find my particular problem:
I have a file that contains several thousand cases and only one variable ( I called it 'dummy') like this
Name Person 1
Address Person 1
Address 2 Person 1
Name Person 2
Address Person 2
Address 2 Person 2
Name Person 3
Address Person 3
Address 2 Person 3
and I would like a 'normal' data set with the variables Name, Address1 and Address2.
I am very grateful for any help.
Andreas
FWIW, that is exactly what my suggested code did, but with only one datastep and only requiring one pass through the data.
Art
I think you can use Data step to deal with it.
data dummy;
length name address1 address2 $ 20;
input name $ address1 $ address2 $ @@;
cards;
Name_Person_1
Address_Person_1
Address 2_Person_1
Name_Person_2
Address_Person_2
Address_2 Person_2
Name_Person_3
Address_Person_3
Address_2_Person_3
;
run;
You might be looking for something like:
data want (drop=dummy);
length Name Address Address2 $30;
set have;
retain Name Address;
if mod(_n_,3) eq 1 then Name=dummy;
else if mod(_n_,3) eq 2 then Address=dummy;
else do;
Address2=dummy;
output;
end;
run;
HTH,
Art
Hi. Art.T
But If one person has more than two address?
and the number of address is vary?
Ksharp
Ksharp,
I don't think it is a question of how complex one could make the problem but, rather, what the OP was really trying to solve. Only he or she can say whether what any of us propose actually can serve as a potential solution. Given what little information was provided, I would (and will) wait until they write back before losing any sleep over potential complexities.
Art
OK.:)
It is some complicated.
data temp; input dummy $50.; cards; Name Person1 Address Person1 Address2 Person1 Name Person2 Address Person2 Address2 Person2 Name Person3 Address Person3 Address2 Person3 ; run; data temp; set temp; if dummy eq: 'Name' then count+1; run; data _null_; set temp end=last; by count notsorted; if _n_ eq 1 then call execute('data want;'); if first.count then n=0; if dummy ne: 'Name' then n+1; if dummy eq: 'Name' then call execute('name="'||dummy||'";'); else call execute('address'||strip(n)||'="'||dummy||'";'); if last.count then call execute('output;'); if last then call execute('run;'); run;
Ksharp
Provided your bput is always spread over three lines (name, address1, address2) then this is how it could be done:
data addresses;
input name $ / address1 $ / address2;
...
The example is far from complete but; it just illustrates the use of the slash as a way to move the input to the next line.
Good luck,
- Jan.
Thanks a lot all of you. Ksharp's example works fine (I have to find out why and how). The only small thing is that since the string 'Name' is not in my dummy variable, I end up with three variables called address1-address3 but I can rename variables. I had cleaned up the data before, removing phone and fax numbers (there were different numbers of them but luckily we don't need them at the moment).
I have different data files in different exotic formats so this may help me in figuring them out myself, the principle is the same. Thanks again to all of you for your fast and helpful responses!
Now I feel stupid, I don't know if I only tried with the test data earlier but it doesn't work, since the data doesn't have the text 'Name' in it, I guess.
I need to create a 'count' variable that is 1 for the first 3 observations, 2 for the next 3 and so on, then the code would work.
Hmm, I had an idea and the result looks ok, I will check.
Not elegant (I needed two steps to create the ID) but it works:
data tmp2;
set tmp1;
count= MOD(_N_,3);
run;
data tmp3;
set tmp2;
if count=0 then count=3;
run;
data want (drop=dummy);
length Name Address Address2 $30;
set tmp3;
retain Name Address Address2;
if count=1 then Name=dummy;
else if count=2 then Address=dummy;
else do;
Address2=dummy;
output;
end;
run;
At first it didn't work but it just showed me some data I hadn't cleaned up.
Thanks all of you!
FWIW, that is exactly what my suggested code did, but with only one datastep and only requiring one pass through the data.
Art
Yes, thank you, you can see I used your code. It didn't work at first, probably because 'address2' was not retained.
It is easy and I can understand how it works. I had actually created the count variable myself (but as I saw afterwards using the same function you used) and at the time I thought I needed it to be 1, 2, 3 but of course 1, 2, 0 works just as well.
I see now that I could have just used your code after adding 'address2' after retain.
You don't have to retain Address2 as the record is being output as soon as you assign Address2. Variables only have to be retained if you need to keep them across iterations.
Art
You're right again. My problems with running your code were caused entirely by the erroneous cases that were still in my data set but I didn't know it at the time.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.