BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
superbug
Quartz | Level 8

I posted my question earlier, but maybe I didn't state my question in a neat way, so I post it again. 

The data I have is like the following,

data have;
input item_id $ resp;
datalines;
pp1111  1
pp1111  2
pp1111  3
pp2222  1
pp2222  2
pp2222  3
pp2222  4
pp3333  1
pp3333  3
pp3333  4
pp4444  1
pp4444  2
pp5555  1
;

I want each item_id have four rows, the reformatted data I want should be look like the following

data want;
input item_id $ resp;
datalines;
pp1111  1
pp1111  2
pp1111  3
pp1111
pp2222  1
pp2222  2
pp2222  3
pp2222  4
pp3333  1
pp3333  
pp3333  3
pp3333  4
pp4444  1
pp4444  2
pp4444  
pp4444  
pp5555  1
pp5555  
pp5555  
pp5555  
;

Any help would be much appreciated!

1 ACCEPTED SOLUTION

Accepted Solutions
novinosrin
Tourmaline | Level 20

Hi @superbug  



data have;
input item_id $ resp;
datalines;
pp1111  1
pp1111  2
pp1111  3
pp2222  1
pp2222  2
pp2222  3
pp2222  4
pp3333  1
pp3333  3
pp3333  4
pp4444  1
pp4444  2
pp5555  1
;

data want;
 if _n_=1 then do;
  dcl hash H () ;
  h.definekey  ("resp") ;
  h.definedone () ;
 end;
 do until(last.item_id);
  set have;
  by item_id;
  if resp then h.add();
 end;
 do _n_=1 to 4;
  if h.find(key:_n_) then call missing(resp);
  output;
 end;
 h.clear();
run;

View solution in original post

8 REPLIES 8
novinosrin
Tourmaline | Level 20

Hi @superbug  



data have;
input item_id $ resp;
datalines;
pp1111  1
pp1111  2
pp1111  3
pp2222  1
pp2222  2
pp2222  3
pp2222  4
pp3333  1
pp3333  3
pp3333  4
pp4444  1
pp4444  2
pp5555  1
;

data want;
 if _n_=1 then do;
  dcl hash H () ;
  h.definekey  ("resp") ;
  h.definedone () ;
 end;
 do until(last.item_id);
  set have;
  by item_id;
  if resp then h.add();
 end;
 do _n_=1 to 4;
  if h.find(key:_n_) then call missing(resp);
  output;
 end;
 h.clear();
run;
superbug
Quartz | Level 8

@novinosrin 

It seems I was happy too early. In my data I also have "freq" and"pct" columns, using your code, it turned out, for each item, the "freq" and "pct" columns picked the value in last row of each item. I tried to modify your code, but there is error. Could you please modify your code to fit my purpose?

my data have is like the following

data have;
input item_id $ resp freq pct;
datalines;
pp1111  1    2     0.3
pp1111  2    5     0.5
pp1111  3    6     0.7
pp2222  1    3     0.4
pp2222  2    2     0.2
pp2222  3    4     0.5
pp2222  4    3     0.4
pp3333  1    6     0.8
pp3333  3    4     0.5
pp3333  4    10    0.7
pp4444  1    12    0.6
pp4444  2    9     0.5
pp5555  1    7     0.3
;

I want it to be like this

data want;
input item_id $ resp freq pct;
datalines;
pp1111  1    2     0.3
pp1111  2    5     0.5
pp1111  3    6     0.7
pp1111
pp2222  1    3     0.4
pp2222  2    2     0.2
pp2222  3    4     0.5
pp2222  4    3     0.4
pp3333  1    6     0.8
pp3333
pp3333  3    4     0.5
pp3333  4    10    0.7
pp4444  1    12    0.6
pp4444  2    9     0.5
pp4444
pp4444
pp5555  1    7     0.3
pp5555
pp5555
pp5555
;

thanks a bunch!

 

novinosrin
Tourmaline | Level 20

Hi @superbug  Just a minor tweak. Please see if this helps-

data have;
input item_id $ resp freq pct;
datalines;
pp1111  1    2     0.3
pp1111  2    5     0.5
pp1111  3    6     0.7
pp2222  1    3     0.4
pp2222  2    2     0.2
pp2222  3    4     0.5
pp2222  4    3     0.4
pp3333  1    6     0.8
pp3333  3    4     0.5
pp3333  4    10    0.7
pp4444  1    12    0.6
pp4444  2    9     0.5
pp5555  1    7     0.3
;
data want;
 if _n_=1 then do;
  dcl hash H () ;
  h.definekey  ("resp") ;
  h.definedata("resp","freq","pct");
  h.definedone () ;
 end;
 do until(last.item_id);
  set have;
  by item_id;
  if resp then h.add();
 end;
 do _n_=1 to 4;
  if h.find(key:_n_) then call missing(resp,freq,pct);
  output;
 end;
 h.clear();
run;
superbug
Quartz | Level 8

@novinosrin 

Yes, it worked! Any difficulty problem for me is just a piece of cake for you. 

As always, thank you so much for the help!

Reeza
Super User
You do realize that if you're generating this data from PROC FREQ, which it does look like it comes from, the SPARSE option automatically adds in the missing levels for you?
Reeza
Super User
proc freq data=have noprint;
table item_id * resp / sparse out=temp;
run;

data want;
set temp;
*removes resp from records that have a 0 count;
if count = 0 then call missing(resp);

drop count percent;
run;

Another common way is to create a skeleton table with all your possible combinations and merge your input data.


/*SQL approach*/
*build skeleton table;
proc sql;
create table skeletonTable as
select distinct t1.item_id, t2.resp 
from have as t1, have as t2;
quit;

*merge with table (prior step combined);
proc sql;
create table want as
select t4.item_id, t3.resp
from have as t3 right join 
(select distinct t1.item_id, t2.resp 
from have as t1, have as t2) as t4
on t3.item_id=t4.item_id and t3.resp=t4.resp;
quit;

 

ballardw
Super User

@superbug wrote:

I posted my question earlier, but maybe I didn't state my question in a neat way, so I post it again. 

The data I have is like the following,

data have;
input item_id $ resp;
datalines;
pp1111  1
pp1111  2
pp1111  3
pp2222  1
pp2222  2
pp2222  3
pp2222  4
pp3333  1
pp3333  3
pp3333  4
pp4444  1
pp4444  2
pp5555  1
;

I want each item_id have four rows, the reformatted data I want should be look like the following

data want;
input item_id $ resp;
datalines;
pp1111  1
pp1111  2
pp1111  3
pp1111
pp2222  1
pp2222  2
pp2222  3
pp2222  4
pp3333  1
pp3333  
pp3333  3
pp3333  4
pp4444  1
pp4444  2
pp4444  
pp4444  
pp5555  1
pp5555  
pp5555  
pp5555  
;

Any help would be much appreciated!


One way that assumes your data is sorted by Item_id and resp. If the data is not sorted by item_id but only grouped then you would add the NOTSORTED option to the BY item_id statement.

data want;
   set have;
   by item_id;
   output;
   if last.item_id and resp< 4 then do;
      counter=resp;
      resp=.;
      do until (counter=4) ;
         output;
         counter=counter+1;
      end;
   end;
   drop counter;
run;

You should consider changing the use of the word "format" to discuss data set contents or structure. In SAS a Format specifically refers to the display characteristics for a variable and can lead to some confusion when used in some contexts.

superbug
Quartz | Level 8

@ballardw @Reeza @novinosrin 

Thank you all fantastic SAS experts! You saved my day. A big thumb up to you all!

 

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 8 replies
  • 1970 views
  • 0 likes
  • 4 in conversation