BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
superbug
Quartz | Level 8

My data  is as attached. The left two columns is the data format I have, the right two columns is the data format I want. The rows highlighted in red are the rows I want to insert. 

Each "item_id" is supposed to have four rows, but some "item_id" only have three, two, or one rows. How to reformat the data I have to the data I want?

Thanks!

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

Is it always 4 observations per id?  Does the value of the other variable matter?

The classic way to do this is to build a template dataset and then overlay your actual values.   The CALL MISSING() handles side effects of many-to-many merge.

data template ;
  set have ;
  by id ;
  if first.id then do i=1 to 4;
    output;
  end;
  keep id;
run;
data want;
  merge template have ;
  by id ;
  output;
  call missing(of _all_);
run;

If your extra variable is really a sequence counter then it is easier as you don't have to deal with many-to-many merge.  Plus it will properly insert empty rows that are not the last in the group, but are instead somewhere in the middle.

data have;
  input id seq value ;
cards;
1 1 23
1  2 123
2  1 45
2  3  23
;
data template;
   set have ;
   by id ;
   if first.id then do seq=1 to 4;
       output;
   end;
  keep id seq;
run;
data want ;
  merge template have;
  by id seq ;
run;

 

View solution in original post

9 REPLIES 9
mkeintz
PROC Star

Did you know that, instead of attaching a spreadsheet that each of your potential helpers would need to download and open, you can paste its contents directly into a table in your topic statement?  Then we could see the values instantly and focus our effort on helping to find a solution instead of figuring out the question.  Encourage us.

 

 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
andreas_lds
Jade | Level 19

Or, even better than posting data as table, post it as a data-step using datalines, so that we actually have data, not a spreadsheet.

Kurt_Bremser
Super User

Please post example data in self-contained data steps, as you have been shown repeatedly in your previous threads. It's not rocket science, and it helps us greatly in helping you.

superbug
Quartz | Level 8

@Kurt_Bremser @smantha @andreas_lds @mkeintz 

Thanks for the suggestion of how to post effective questions! I am sorry for any inconveniences of all my previous posts. The picture below is the data I am having and the format I want. I hope the picture can be self-explaining.

The data format I am having is like the A and B columns. Basically, each individual "item_id" is supposed to have four rows. If the rows of each individual "item_id" is less than four,  I want to make them to be four rows, like columns D and E. In column D, highlights in red is the rows that I want to insert.

Thanks for the help!

 

 

data photo.PNG

 

mkeintz
PROC Star

@superbug 

 

Just to be clear.  You posted a picture, not a table, or (gold standard per @Kurt_Bremser) a data step reading raw data.  The png is a nice start, but don't stop there.  After all, the proposed solution showed a generic program structure, but somebody will have to modify it to use the actual data.

 

Help us help you.

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Tom
Super User Tom
Super User

Is it always 4 observations per id?  Does the value of the other variable matter?

The classic way to do this is to build a template dataset and then overlay your actual values.   The CALL MISSING() handles side effects of many-to-many merge.

data template ;
  set have ;
  by id ;
  if first.id then do i=1 to 4;
    output;
  end;
  keep id;
run;
data want;
  merge template have ;
  by id ;
  output;
  call missing(of _all_);
run;

If your extra variable is really a sequence counter then it is easier as you don't have to deal with many-to-many merge.  Plus it will properly insert empty rows that are not the last in the group, but are instead somewhere in the middle.

data have;
  input id seq value ;
cards;
1 1 23
1  2 123
2  1 45
2  3  23
;
data template;
   set have ;
   by id ;
   if first.id then do seq=1 to 4;
       output;
   end;
  keep id seq;
run;
data want ;
  merge template have;
  by id seq ;
run;

 

superbug
Quartz | Level 8

@Tom 

Thank you so much, your code is neat and easily understand.

@mklangley  

Thank you as well!

smantha
Lapis Lazuli | Level 10
Did you try
Data want(
Set a b;
By item_id;
Run;
In sas terminology this is interleaving.
mklangley
Lapis Lazuli | Level 10

@superbug  Try this. It works with the data you provided.

 

data have;
    input item_id $ resp;
    datalines;
    PP1111 1
    PP1111 2
    PP1111 3
    pp2222 1
    pp2222 2
    pp2222 3
    pp2222 4
    pp3333 1
    pp3333 3
    pp3333 4
    pp4444 1
    pp4444 2
    pp5555 1
    ;
run;

proc sql noprint;
  select substr(max(item_id),3,1) into: max from have;
quit;

data arr (keep=a: i);
    array arr{&max.,4};
    set have;
    i = input(substr(item_id,3,1), 1.);
    arr{i,resp} = resp;
run;

proc transpose data=arr out=arr_trans name=arr_num;
    by i;
run;

data want (keep=item_id resp);
    set arr_trans;
    var = coalesce(col1, col2, col3, col4);
    item_id = cat('pp', put(i,1.), put(i,1.), put(i,1.), put(i,1.));
    resp = var;
    where (arr_num = "arr"||strip(put(((i-1)*4+1),2.))
        or arr_num = "arr"||strip(put(((i-1)*4+2),2.))
        or arr_num = "arr"||strip(put(((i-1)*4+3),2.))
        or arr_num = "arr"||strip(put(((i-1)*4+4),2.)));
run;

This code could certainly be cleaned up a lot. This is not a very dynamic approach. I have no doubt there is a cleaner solution.

Edited to remove unnecessary code and reduce hard-coding.

sas-innovate-wordmark-2025-midnight.png

Register Today!

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.


Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 9 replies
  • 1935 views
  • 0 likes
  • 7 in conversation