BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
superbug
Quartz | Level 8

My data  is as attached. The left two columns is the data format I have, the right two columns is the data format I want. The rows highlighted in red are the rows I want to insert. 

Each "item_id" is supposed to have four rows, but some "item_id" only have three, two, or one rows. How to reformat the data I have to the data I want?

Thanks!

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

Is it always 4 observations per id?  Does the value of the other variable matter?

The classic way to do this is to build a template dataset and then overlay your actual values.   The CALL MISSING() handles side effects of many-to-many merge.

data template ;
  set have ;
  by id ;
  if first.id then do i=1 to 4;
    output;
  end;
  keep id;
run;
data want;
  merge template have ;
  by id ;
  output;
  call missing(of _all_);
run;

If your extra variable is really a sequence counter then it is easier as you don't have to deal with many-to-many merge.  Plus it will properly insert empty rows that are not the last in the group, but are instead somewhere in the middle.

data have;
  input id seq value ;
cards;
1 1 23
1  2 123
2  1 45
2  3  23
;
data template;
   set have ;
   by id ;
   if first.id then do seq=1 to 4;
       output;
   end;
  keep id seq;
run;
data want ;
  merge template have;
  by id seq ;
run;

 

View solution in original post

9 REPLIES 9
mkeintz
PROC Star

Did you know that, instead of attaching a spreadsheet that each of your potential helpers would need to download and open, you can paste its contents directly into a table in your topic statement?  Then we could see the values instantly and focus our effort on helping to find a solution instead of figuring out the question.  Encourage us.

 

 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
andreas_lds
Jade | Level 19

Or, even better than posting data as table, post it as a data-step using datalines, so that we actually have data, not a spreadsheet.

Kurt_Bremser
Super User

Please post example data in self-contained data steps, as you have been shown repeatedly in your previous threads. It's not rocket science, and it helps us greatly in helping you.

superbug
Quartz | Level 8

@Kurt_Bremser @smantha @andreas_lds @mkeintz 

Thanks for the suggestion of how to post effective questions! I am sorry for any inconveniences of all my previous posts. The picture below is the data I am having and the format I want. I hope the picture can be self-explaining.

The data format I am having is like the A and B columns. Basically, each individual "item_id" is supposed to have four rows. If the rows of each individual "item_id" is less than four,  I want to make them to be four rows, like columns D and E. In column D, highlights in red is the rows that I want to insert.

Thanks for the help!

 

 

data photo.PNG

 

mkeintz
PROC Star

@superbug 

 

Just to be clear.  You posted a picture, not a table, or (gold standard per @Kurt_Bremser) a data step reading raw data.  The png is a nice start, but don't stop there.  After all, the proposed solution showed a generic program structure, but somebody will have to modify it to use the actual data.

 

Help us help you.

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Tom
Super User Tom
Super User

Is it always 4 observations per id?  Does the value of the other variable matter?

The classic way to do this is to build a template dataset and then overlay your actual values.   The CALL MISSING() handles side effects of many-to-many merge.

data template ;
  set have ;
  by id ;
  if first.id then do i=1 to 4;
    output;
  end;
  keep id;
run;
data want;
  merge template have ;
  by id ;
  output;
  call missing(of _all_);
run;

If your extra variable is really a sequence counter then it is easier as you don't have to deal with many-to-many merge.  Plus it will properly insert empty rows that are not the last in the group, but are instead somewhere in the middle.

data have;
  input id seq value ;
cards;
1 1 23
1  2 123
2  1 45
2  3  23
;
data template;
   set have ;
   by id ;
   if first.id then do seq=1 to 4;
       output;
   end;
  keep id seq;
run;
data want ;
  merge template have;
  by id seq ;
run;

 

superbug
Quartz | Level 8

@Tom 

Thank you so much, your code is neat and easily understand.

@mklangley  

Thank you as well!

smantha
Lapis Lazuli | Level 10
Did you try
Data want(
Set a b;
By item_id;
Run;
In sas terminology this is interleaving.
mklangley
Lapis Lazuli | Level 10

@superbug  Try this. It works with the data you provided.

 

data have;
    input item_id $ resp;
    datalines;
    PP1111 1
    PP1111 2
    PP1111 3
    pp2222 1
    pp2222 2
    pp2222 3
    pp2222 4
    pp3333 1
    pp3333 3
    pp3333 4
    pp4444 1
    pp4444 2
    pp5555 1
    ;
run;

proc sql noprint;
  select substr(max(item_id),3,1) into: max from have;
quit;

data arr (keep=a: i);
    array arr{&max.,4};
    set have;
    i = input(substr(item_id,3,1), 1.);
    arr{i,resp} = resp;
run;

proc transpose data=arr out=arr_trans name=arr_num;
    by i;
run;

data want (keep=item_id resp);
    set arr_trans;
    var = coalesce(col1, col2, col3, col4);
    item_id = cat('pp', put(i,1.), put(i,1.), put(i,1.), put(i,1.));
    resp = var;
    where (arr_num = "arr"||strip(put(((i-1)*4+1),2.))
        or arr_num = "arr"||strip(put(((i-1)*4+2),2.))
        or arr_num = "arr"||strip(put(((i-1)*4+3),2.))
        or arr_num = "arr"||strip(put(((i-1)*4+4),2.)));
run;

This code could certainly be cleaned up a lot. This is not a very dynamic approach. I have no doubt there is a cleaner solution.

Edited to remove unnecessary code and reduce hard-coding.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 9 replies
  • 2635 views
  • 0 likes
  • 7 in conversation