- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
i have a dataset
yearmonth areacode barcode tot_qty tot_value
where year months are like : 201401
201402
201403
i need a output like
areacode barcode _201401_tot_qty _201401_tot_value and so on for all months
unique is [yearmonth+ areacode +barcode]
is there any possibilities to do it with proc transpose..
thanks in advance
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Well, yes you could do that by using id and idlabel:
proc transpose data =xyz;
by abc;
id year;
idlabel year;
run;
I wouldn't recommend that approach however. Its something I have seen a lot here on the forums, putting data in the variable name. Its not good practice for several reasons:
- You lose the direct arraying functionality, ee.g col1 col2 col3 etc. can be referenced as array col{*}.
- Also not having the number at the end means no list approaches, e.g. col1-col10.
- Your code will become big and unweildy very quickly as you try to remember all the variables, and your table will stretch out to unreadable lengths.
I would advise you re-assess what you are trying to do, if its for a report, then will anyone really look at a table with more than xx columns? I personally wouldn't be scrolling across to review in Excel for instance. If it is for further coding then there are better ways of doing it, i.e. keeping the normalised structure and using by group processing. If for something else, consider re-doing the output or structure to better fit the purpose, maybe a yearly based report so you can fix 12 months. Etc.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Well, yes you could do that by using id and idlabel:
proc transpose data =xyz;
by abc;
id year;
idlabel year;
run;
I wouldn't recommend that approach however. Its something I have seen a lot here on the forums, putting data in the variable name. Its not good practice for several reasons:
- You lose the direct arraying functionality, ee.g col1 col2 col3 etc. can be referenced as array col{*}.
- Also not having the number at the end means no list approaches, e.g. col1-col10.
- Your code will become big and unweildy very quickly as you try to remember all the variables, and your table will stretch out to unreadable lengths.
I would advise you re-assess what you are trying to do, if its for a report, then will anyone really look at a table with more than xx columns? I personally wouldn't be scrolling across to review in Excel for instance. If it is for further coding then there are better ways of doing it, i.e. keeping the normalised structure and using by group processing. If for something else, consider re-doing the output or structure to better fit the purpose, maybe a yearly based report so you can fix 12 months. Etc.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Its not working as my date are like:
yearmonth areacode barcode tot_qty tot_value
201402 AR01 1101 67 123
201402 AR01 1201 67 123
201402 AR01 1301 67 123
201402 AR01 1401 67 123
201402 AR01 1501 67 123
201402 AR01 1601 67 123
201402 AR01 1701 67 123
201403 AR01 1101 67 123
201404 AR01 1201 67 123
201405 AR01 1301 67 123
201406 AR01 1401 67 123
201407 AR01 1501 67 123
201408 AR01 1601 67 123
201409 AR01 1701 67 123
my Desired output like :
areacode barcode _201402_tot_qty _201402_tot_value ... so on for year
I have done it by array,,
is there any possiblities in proc transpose to do the same???
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I don't understand your desired structure because the way you describe it cannot work since the first 8 records all have the same values for YEARMONTH variable.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Original data:(input data)
yearmonth areacode barcode tot_qty tot_value
201402 AR01 1101 67 123
201402 AR01 1201 67 123
201402 AR01 1301 67 123
201402 AR01 1401 67 123
201402 AR01 1501 67 123
201402 AR01 1601 67 123
201402 AR01 1701 67 123
201403 AR01 1101 67 123
201403 AR01 1201 67 123
201403 AR01 1301 67 123
201403 AR01 1401 67 123
201403 AR01 1501 67 123
201403 AR01 1601 67 123
201403 AR01 1701 67 123
....
till 201501
Desired ouput:
areacode barcode _201402_tot_qty _201402_tot_value _201403_tot_qty _201403_tot_value and so on .. till _201501_tot_qty _201501_tot_
AR01 1101 67 123 67 123
AR01 1201 67 123 67 123
AR01 1301 67 123 67 123
AR01 1401 67 123 67 123
AR01 1501 67 123 67 123
AR01 1601 67 123 67 123
AR01 1701 67 123 67 123
... many area and many barcode..
* number are not fixed.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I think you are only asking how to do something like:
proc sort data=have out=need;
by areacode barcode yearmonth;
run;
proc transpose data=need out=need2 (drop=_:) prefix=tot_qty_;
by areacode barcode;
var tot_qty;
id yearmonth;
run;
proc transpose data=need out=need3 (drop=_:) prefix=tot_value_;
by areacode barcode;
var tot_value;
id yearmonth;
run;
data want;
merge need2 need3;
by areacode barcode;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
This is what i have done, but its showing error that: The ID value "_tot_qty_201402" occurs twice in the same by group
Too many bad by groups
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Then you have duplicates in your BY groups.
What do you want to happen when you have multiple records for an areacode/barcode combination?
Data duplicates;
set have;
by areacod barcode;
if not (first.barcode and last.barcode);
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
plz have a look on data:
Original data:(input data)
yearmonth areacode barcode tot_qty tot_value
201402 AR01 1101 67 123
201402 AR01 1201 67 123
201402 AR01 1301 67 123
201402 AR01 1401 67 123
201402 AR01 1501 67 123
201402 AR01 1601 67 123
201402 AR01 1701 67 123
201403 AR01 1101 67 123
201403 AR01 1201 67 123
201403 AR01 1301 67 123
201403 AR01 1401 67 123
201403 AR01 1501 67 123
201403 AR01 1601 67 123
201403 AR01 1701 67 123
....
till 201501
Desired ouput:
areacode barcode _201402_tot_qty _201402_tot_value _201403_tot_qty _201403_tot_value and so on .. till _201501_tot_qty _201501_tot_
AR01 1101 67 123 67 123
AR01 1201 67 123 67 123
AR01 1301 67 123 67 123
AR01 1401 67 123 67 123
AR01 1501 67 123 67 123
AR01 1601 67 123 67 123
AR01 1701 67 123 67 123
... many area and many barcode..
* number are not fixed.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Try this. But if you have duplicates you might need to make adjustments.
data have ;
input yearmonth areacode $ barcode $ tot_qty tot_value ;
cards;
201402 AR01 1101 67 123
201402 AR01 1201 67 123
201402 AR01 1301 67 123
201402 AR01 1401 67 123
201402 AR01 1501 67 123
201402 AR01 1601 67 123
201402 AR01 1701 67 123
201403 AR01 1101 67 123
201403 AR01 1201 67 123
201403 AR01 1301 67 123
201403 AR01 1401 67 123
201403 AR01 1501 67 123
201403 AR01 1601 67 123
201403 AR01 1701 67 123
;;;;
proc transpose data=have out=middle ;
by areacode barcode yearmonth notsorted;
var tot_qty tot_value ;
run;
proc sort ;
by areacode barcode yearmonth _name_ ;
run;
proc transpose data=middle out=want(drop=_name_) delim=_;
by areacode barcode ;
id yearmonth _name_;
var col1 ;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
here will miss col2 values (in case).
code still not working with my data.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Because your sample data does not match your actual data.
Please run the code I posted earlier. Post how many records are in that table.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
near about 20m. , its just a sample of actual data.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Sorry, missed most of this conversation. You are going to tranpose a table of 20million records, using year month as an identifier * 2 per year month, you do realise even if only one year is done that is 24 variables, have you considered how you are going to work with this further down the line?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
This is what your data actually looks like according to the errors you're mentioning.
How do you want to deal with the records in RED that have duplicate areacode/barcode combinations?
I agree with @RW9 though, this is a very bad way to store data.
See the Tidy Data paper from Hadley Wickam, though it references R, the concepts are the same.
http://vita.had.co.nz/papers/tidy-data.pdf
data have ;
input yearmonth areacode $ barcode $ tot_qty tot_value ;
cards;
201402 AR01 1101 67 123
201402 AR01 1201 67 123
201402 AR01 1301 67 123
201402 AR01 1401 67 123
201402 AR01 1501 67 123
201402 AR01 1501 67 123
201402 AR01 1601 67 123
201402 AR01 1701 67 123
201403 AR01 1101 67 123
201403 AR01 1201 67 123
201403 AR01 1301 67 123
201403 AR01 1401 67 123
201403 AR01 1501 67 123
201403 AR01 1601 67 123
201403 AR01 1701 67 123
201403 AR01 1701 67 123
;;;;