Solved: How to format this data

superbug · Posted 06-25-2020 12:02 PM

I posted my question earlier, but maybe I didn't state my question in a neat way, so I post it again.

The data I have is like the following,

data have;
input item_id $ resp;
datalines;
pp1111  1
pp1111  2
pp1111  3
pp2222  1
pp2222  2
pp2222  3
pp2222  4
pp3333  1
pp3333  3
pp3333  4
pp4444  1
pp4444  2
pp5555  1
;

I want each item_id have four rows, the reformatted data I want should be look like the following

data want;
input item_id $ resp;
datalines;
pp1111  1
pp1111  2
pp1111  3
pp1111
pp2222  1
pp2222  2
pp2222  3
pp2222  4
pp3333  1
pp3333  
pp3333  3
pp3333  4
pp4444  1
pp4444  2
pp4444  
pp4444  
pp5555  1
pp5555  
pp5555  
pp5555  
;

Any help would be much appreciated!

novinosrin · Posted 06-25-2020 12:11 PM

Hi @superbug



data have;
input item_id $ resp;
datalines;
pp1111  1
pp1111  2
pp1111  3
pp2222  1
pp2222  2
pp2222  3
pp2222  4
pp3333  1
pp3333  3
pp3333  4
pp4444  1
pp4444  2
pp5555  1
;

data want;
 if _n_=1 then do;
  dcl hash H () ;
  h.definekey  ("resp") ;
  h.definedone () ;
 end;
 do until(last.item_id);
  set have;
  by item_id;
  if resp then h.add();
 end;
 do _n_=1 to 4;
  if h.find(key:_n_) then call missing(resp);
  output;
 end;
 h.clear();
run;

View solution in original post

novinosrin · Posted 06-25-2020 12:11 PM

Hi @superbug



data have;
input item_id $ resp;
datalines;
pp1111  1
pp1111  2
pp1111  3
pp2222  1
pp2222  2
pp2222  3
pp2222  4
pp3333  1
pp3333  3
pp3333  4
pp4444  1
pp4444  2
pp5555  1
;

data want;
 if _n_=1 then do;
  dcl hash H () ;
  h.definekey  ("resp") ;
  h.definedone () ;
 end;
 do until(last.item_id);
  set have;
  by item_id;
  if resp then h.add();
 end;
 do _n_=1 to 4;
  if h.find(key:_n_) then call missing(resp);
  output;
 end;
 h.clear();
run;

superbug · Posted 06-25-2020 01:26 PM

@novinosrin

It seems I was happy too early. In my data I also have "freq" and"pct" columns, using your code, it turned out, for each item, the "freq" and "pct" columns picked the value in last row of each item. I tried to modify your code, but there is error. Could you please modify your code to fit my purpose?

my data have is like the following

data have;
input item_id $ resp freq pct;
datalines;
pp1111  1    2     0.3
pp1111  2    5     0.5
pp1111  3    6     0.7
pp2222  1    3     0.4
pp2222  2    2     0.2
pp2222  3    4     0.5
pp2222  4    3     0.4
pp3333  1    6     0.8
pp3333  3    4     0.5
pp3333  4    10    0.7
pp4444  1    12    0.6
pp4444  2    9     0.5
pp5555  1    7     0.3
;

I want it to be like this

data want;
input item_id $ resp freq pct;
datalines;
pp1111  1    2     0.3
pp1111  2    5     0.5
pp1111  3    6     0.7
pp1111
pp2222  1    3     0.4
pp2222  2    2     0.2
pp2222  3    4     0.5
pp2222  4    3     0.4
pp3333  1    6     0.8
pp3333
pp3333  3    4     0.5
pp3333  4    10    0.7
pp4444  1    12    0.6
pp4444  2    9     0.5
pp4444
pp4444
pp5555  1    7     0.3
pp5555
pp5555
pp5555
;

thanks a bunch!

novinosrin · Posted 06-25-2020 01:38 PM

Hi @superbug Just a minor tweak. Please see if this helps-

data have;
input item_id $ resp freq pct;
datalines;
pp1111  1    2     0.3
pp1111  2    5     0.5
pp1111  3    6     0.7
pp2222  1    3     0.4
pp2222  2    2     0.2
pp2222  3    4     0.5
pp2222  4    3     0.4
pp3333  1    6     0.8
pp3333  3    4     0.5
pp3333  4    10    0.7
pp4444  1    12    0.6
pp4444  2    9     0.5
pp5555  1    7     0.3
;
data want;
 if _n_=1 then do;
  dcl hash H () ;
  h.definekey  ("resp") ;
  h.definedata("resp","freq","pct");
  h.definedone () ;
 end;
 do until(last.item_id);
  set have;
  by item_id;
  if resp then h.add();
 end;
 do _n_=1 to 4;
  if h.find(key:_n_) then call missing(resp,freq,pct);
  output;
 end;
 h.clear();
run;

superbug · Posted 06-25-2020 02:30 PM

@novinosrin

Yes, it worked! Any difficulty problem for me is just a piece of cake for you.

As always, thank you so much for the help!

Reeza · Posted 06-25-2020 03:35 PM

You do realize that if you're generating this data from PROC FREQ, which it does look like it comes from, the SPARSE option automatically adds in the missing levels for you?

Reeza · Posted 06-25-2020 12:13 PM

proc freq data=have noprint;
table item_id * resp / sparse out=temp;
run;

data want;
set temp;
*removes resp from records that have a 0 count;
if count = 0 then call missing(resp);

drop count percent;
run;

Another common way is to create a skeleton table with all your possible combinations and merge your input data.


/*SQL approach*/
*build skeleton table;
proc sql;
create table skeletonTable as
select distinct t1.item_id, t2.resp 
from have as t1, have as t2;
quit;

*merge with table (prior step combined);
proc sql;
create table want as
select t4.item_id, t3.resp
from have as t3 right join 
(select distinct t1.item_id, t2.resp 
from have as t1, have as t2) as t4
on t3.item_id=t4.item_id and t3.resp=t4.resp;
quit;

ballardw · Posted 06-25-2020 12:16 PM

@superbug wrote:

I posted my question earlier, but maybe I didn't state my question in a neat way, so I post it again.

The data I have is like the following,
data have;
input item_id $ resp;
datalines;
pp1111  1
pp1111  2
pp1111  3
pp2222  1
pp2222  2
pp2222  3
pp2222  4
pp3333  1
pp3333  3
pp3333  4
pp4444  1
pp4444  2
pp5555  1
;
I want each item_id have four rows, the reformatted data I want should be look like the following
data want;
input item_id $ resp;
datalines;
pp1111  1
pp1111  2
pp1111  3
pp1111
pp2222  1
pp2222  2
pp2222  3
pp2222  4
pp3333  1
pp3333  
pp3333  3
pp3333  4
pp4444  1
pp4444  2
pp4444  
pp4444  
pp5555  1
pp5555  
pp5555  
pp5555  
;
Any help would be much appreciated!

One way that assumes your data is sorted by Item_id and resp. If the data is not sorted by item_id but only grouped then you would add the NOTSORTED option to the BY item_id statement.

data want;
   set have;
   by item_id;
   output;
   if last.item_id and resp< 4 then do;
      counter=resp;
      resp=.;
      do until (counter=4) ;
         output;
         counter=counter+1;
      end;
   end;
   drop counter;
run;

You should consider changing the use of the word "format" to discuss data set contents or structure. In SAS a Format specifically refers to the display characteristics for a variable and can lead to some confusion when used in some contexts.

superbug · Posted 06-25-2020 12:47 PM

@ballardw @Reeza @novinosrin

Thank you all fantastic SAS experts! You saved my day. A big thumb up to you all!

How to format this data

Re: How to format this data

Re: How to format this data

Re: How to format this data

Re: How to format this data

Re: How to format this data

Re: How to format this data

Re: How to format this data

Re: How to format this data

Re: How to format this data

Registration is open

Registration is open

SAS Training: Just a Click Away