DATA Step, Macro, Functions and more

help with transposing data

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 7
Accepted Solution

help with transposing data

Hi!

Sorry if this was discussed but I couldn't find my particular problem:

I have a file that contains several thousand cases and only one variable ( I called it 'dummy') like this

Name Person 1

Address Person 1

Address 2 Person 1

Name Person 2

Address Person 2

Address 2 Person 2

Name Person 3

Address Person 3

Address 2 Person 3

and I would like a 'normal' data set with the variables Name, Address1 and Address2.

I am very grateful for any help.

Andreas


Accepted Solutions
Solution
‎07-05-2011 01:27 PM
PROC Star
Posts: 7,363

Re: help with transposing data

FWIW, that is exactly what my suggested code did, but with only one datastep and only requiring one pass through the data.

Art

View solution in original post


All Replies
Contributor QLi
Contributor
Posts: 57

help with transposing data

I think you can use Data step to deal with it.

data dummy;

length name address1 address2 $ 20;

input name $ address1 $ address2 $ @@;

cards;

Name_Person_1

Address_Person_1

Address 2_Person_1

Name_Person_2

Address_Person_2

Address_2 Person_2

Name_Person_3

Address_Person_3

Address_2_Person_3

;

run;

PROC Star
Posts: 7,363

help with transposing data

You might be looking for something like:

data want (drop=dummy);

  length Name Address Address2 $30;

  set have;

  retain Name Address;

  if mod(_n_,3) eq 1 then Name=dummy;

  else if mod(_n_,3) eq 2 then Address=dummy;

  else do;

    Address2=dummy;

    output;

  end;

run;

HTH,

Art

Super User
Posts: 9,681

help with transposing data

Hi. Art.T

But If one person has more than two address?

and the number of address is vary?

Ksharp

PROC Star
Posts: 7,363

help with transposing data

Ksharp,

I don't think it is a question of how complex one could make the problem but, rather, what the OP was really trying to solve.  Only he or she can say whether what any of us propose actually can serve as a potential solution.  Given what little information was provided, I would (and will) wait until they write back before losing any sleep over potential complexities.

Art

Super User
Posts: 9,681

help with transposing data

OK.Smiley Happy

Super User
Posts: 9,681

help with transposing data

It is some complicated.

data temp;
 input dummy $50.;
cards;
Name Person1
Address Person1
Address2 Person1
Name Person2
Address Person2
Address2 Person2
Name Person3
Address Person3
Address2 Person3
;
run;
data temp;
 set temp;
 if dummy eq: 'Name' then count+1;
run;
data _null_;
 set temp end=last;
 by count notsorted;
 if _n_ eq 1 then call execute('data want;');
 if first.count then n=0;
 if dummy ne: 'Name' then  n+1;
 if dummy eq: 'Name' then call execute('name="'||dummy||'";');
  else  call execute('address'||strip(n)||'="'||dummy||'";');
 if last.count then call execute('output;');
 if last then call execute('run;');
run;



Ksharp

Super Contributor
Posts: 408

help with transposing data

Provided your bput is always spread over three lines (name, address1, address2) then this is how it could be done:

data addresses;

     input name $ / address1 $ / address2;

     ...

The example is far from complete but; it just illustrates the use of the slash as a way to move the input to the next line.

Good luck,

- Jan.

Occasional Contributor
Posts: 7

help with transposing data

Thanks a lot all of you. Ksharp's example works fine (I have to find out why and how). The only small thing is that since the string 'Name' is not in my dummy variable, I end up with three variables called address1-address3 but I can rename variables. I had cleaned up the data before, removing phone and fax numbers (there were different numbers of them but luckily we don't need them at the moment).

I have different data files in different exotic formats so this may help me in figuring them out myself, the principle is the same. Thanks again to all of you for your fast and helpful responses!

Occasional Contributor
Posts: 7

Re: help with transposing data

Now I feel stupid, I don't know if I only tried with the test data earlier but it doesn't work, since the data doesn't have the text 'Name' in it, I guess.

I need to create a 'count' variable that is 1 for the first 3 observations, 2 for the next 3 and so on, then the code would work.

Hmm, I had an idea and the result looks ok, I will check.

Occasional Contributor
Posts: 7

Re: help with transposing data

Not elegant (I needed two steps to create the ID) but it works:

data tmp2;

set tmp1;

count= MOD(_N_,3);

run;

data tmp3;

set tmp2;

if count=0 then count=3;

run;

data want (drop=dummy);

  length Name Address Address2 $30;

  set tmp3;

  retain Name Address Address2;

  if count=1 then Name=dummy;

  else if count=2 then Address=dummy;

  else do;

    Address2=dummy;

    output;

  end;

run;

At first it didn't work but it just showed me some data I hadn't cleaned up.

Thanks all of you!

Solution
‎07-05-2011 01:27 PM
PROC Star
Posts: 7,363

Re: help with transposing data

FWIW, that is exactly what my suggested code did, but with only one datastep and only requiring one pass through the data.

Art

Occasional Contributor
Posts: 7

Re: help with transposing data

Yes, thank you, you can see I used your code. It didn't work at first, probably because 'address2' was not retained.

It is easy and I can understand how it works. I had actually created the count variable myself (but as I saw afterwards using the same function you used) and at the time I thought I needed it to be 1, 2, 3 but of course 1, 2, 0 works just as well.

I see now that I could have just used your code after adding 'address2' after retain.

PROC Star
Posts: 7,363

Re: help with transposing data

You don't have to retain Address2 as the record is being output as soon as you assign Address2.  Variables only have to be retained if you need to keep them across iterations.

Art

Occasional Contributor
Posts: 7

Re: help with transposing data

You're right again. My problems with running your code were caused entirely by the erroneous cases that were still in my data set but I didn't know it at the time.

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 14 replies
  • 264 views
  • 0 likes
  • 5 in conversation