I posted my question earlier, but maybe I didn't state my question in a neat way, so I post it again.
The data I have is like the following,
data have;
input item_id $ resp;
datalines;
pp1111 1
pp1111 2
pp1111 3
pp2222 1
pp2222 2
pp2222 3
pp2222 4
pp3333 1
pp3333 3
pp3333 4
pp4444 1
pp4444 2
pp5555 1
;
I want each item_id have four rows, the reformatted data I want should be look like the following
data want;
input item_id $ resp;
datalines;
pp1111 1
pp1111 2
pp1111 3
pp1111
pp2222 1
pp2222 2
pp2222 3
pp2222 4
pp3333 1
pp3333
pp3333 3
pp3333 4
pp4444 1
pp4444 2
pp4444
pp4444
pp5555 1
pp5555
pp5555
pp5555
;
Any help would be much appreciated!
Hi @superbug
data have;
input item_id $ resp;
datalines;
pp1111 1
pp1111 2
pp1111 3
pp2222 1
pp2222 2
pp2222 3
pp2222 4
pp3333 1
pp3333 3
pp3333 4
pp4444 1
pp4444 2
pp5555 1
;
data want;
if _n_=1 then do;
dcl hash H () ;
h.definekey ("resp") ;
h.definedone () ;
end;
do until(last.item_id);
set have;
by item_id;
if resp then h.add();
end;
do _n_=1 to 4;
if h.find(key:_n_) then call missing(resp);
output;
end;
h.clear();
run;
Hi @superbug
data have;
input item_id $ resp;
datalines;
pp1111 1
pp1111 2
pp1111 3
pp2222 1
pp2222 2
pp2222 3
pp2222 4
pp3333 1
pp3333 3
pp3333 4
pp4444 1
pp4444 2
pp5555 1
;
data want;
if _n_=1 then do;
dcl hash H () ;
h.definekey ("resp") ;
h.definedone () ;
end;
do until(last.item_id);
set have;
by item_id;
if resp then h.add();
end;
do _n_=1 to 4;
if h.find(key:_n_) then call missing(resp);
output;
end;
h.clear();
run;
It seems I was happy too early. In my data I also have "freq" and"pct" columns, using your code, it turned out, for each item, the "freq" and "pct" columns picked the value in last row of each item. I tried to modify your code, but there is error. Could you please modify your code to fit my purpose?
my data have is like the following
data have;
input item_id $ resp freq pct;
datalines;
pp1111 1 2 0.3
pp1111 2 5 0.5
pp1111 3 6 0.7
pp2222 1 3 0.4
pp2222 2 2 0.2
pp2222 3 4 0.5
pp2222 4 3 0.4
pp3333 1 6 0.8
pp3333 3 4 0.5
pp3333 4 10 0.7
pp4444 1 12 0.6
pp4444 2 9 0.5
pp5555 1 7 0.3
;
I want it to be like this
data want;
input item_id $ resp freq pct;
datalines;
pp1111 1 2 0.3
pp1111 2 5 0.5
pp1111 3 6 0.7
pp1111
pp2222 1 3 0.4
pp2222 2 2 0.2
pp2222 3 4 0.5
pp2222 4 3 0.4
pp3333 1 6 0.8
pp3333
pp3333 3 4 0.5
pp3333 4 10 0.7
pp4444 1 12 0.6
pp4444 2 9 0.5
pp4444
pp4444
pp5555 1 7 0.3
pp5555
pp5555
pp5555
;
thanks a bunch!
Hi @superbug Just a minor tweak. Please see if this helps-
data have;
input item_id $ resp freq pct;
datalines;
pp1111 1 2 0.3
pp1111 2 5 0.5
pp1111 3 6 0.7
pp2222 1 3 0.4
pp2222 2 2 0.2
pp2222 3 4 0.5
pp2222 4 3 0.4
pp3333 1 6 0.8
pp3333 3 4 0.5
pp3333 4 10 0.7
pp4444 1 12 0.6
pp4444 2 9 0.5
pp5555 1 7 0.3
;
data want;
if _n_=1 then do;
dcl hash H () ;
h.definekey ("resp") ;
h.definedata("resp","freq","pct");
h.definedone () ;
end;
do until(last.item_id);
set have;
by item_id;
if resp then h.add();
end;
do _n_=1 to 4;
if h.find(key:_n_) then call missing(resp,freq,pct);
output;
end;
h.clear();
run;
Yes, it worked! Any difficulty problem for me is just a piece of cake for you.
As always, thank you so much for the help!
proc freq data=have noprint;
table item_id * resp / sparse out=temp;
run;
data want;
set temp;
*removes resp from records that have a 0 count;
if count = 0 then call missing(resp);
drop count percent;
run;
Another common way is to create a skeleton table with all your possible combinations and merge your input data.
/*SQL approach*/
*build skeleton table;
proc sql;
create table skeletonTable as
select distinct t1.item_id, t2.resp
from have as t1, have as t2;
quit;
*merge with table (prior step combined);
proc sql;
create table want as
select t4.item_id, t3.resp
from have as t3 right join
(select distinct t1.item_id, t2.resp
from have as t1, have as t2) as t4
on t3.item_id=t4.item_id and t3.resp=t4.resp;
quit;
@superbug wrote:
I posted my question earlier, but maybe I didn't state my question in a neat way, so I post it again.
The data I have is like the following,
data have; input item_id $ resp; datalines; pp1111 1 pp1111 2 pp1111 3 pp2222 1 pp2222 2 pp2222 3 pp2222 4 pp3333 1 pp3333 3 pp3333 4 pp4444 1 pp4444 2 pp5555 1 ;
I want each item_id have four rows, the reformatted data I want should be look like the following
data want; input item_id $ resp; datalines; pp1111 1 pp1111 2 pp1111 3 pp1111 pp2222 1 pp2222 2 pp2222 3 pp2222 4 pp3333 1 pp3333 pp3333 3 pp3333 4 pp4444 1 pp4444 2 pp4444 pp4444 pp5555 1 pp5555 pp5555 pp5555 ;
Any help would be much appreciated!
One way that assumes your data is sorted by Item_id and resp. If the data is not sorted by item_id but only grouped then you would add the NOTSORTED option to the BY item_id statement.
data want; set have; by item_id; output; if last.item_id and resp< 4 then do; counter=resp; resp=.; do until (counter=4) ; output; counter=counter+1; end; end; drop counter; run;
You should consider changing the use of the word "format" to discuss data set contents or structure. In SAS a Format specifically refers to the display characteristics for a variable and can lead to some confusion when used in some contexts.
Thank you all fantastic SAS experts! You saved my day. A big thumb up to you all!
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.