I am reading in data to create X dataset. The raw data comes as a seperate observation/line for each ItemCode. I am wanting to combine the IDs into one Observation and have multiple ItemCodes strung out horizontally instread of verticle. I have tried a proc transpose but that does not combine IDs into one observation.
proc sort data=Raw nodup;
by ID ItemCode;
run;
data Raw;
set Raw;
counter=_N_;
_N_=1;
run;
PROC SORT DATA=Raw;
BY ID ItemCode;
RUN;
DATA NewRaw1;
ARRAY ??? (Unknown number of observations)
RETAIN ?
BY ID;
IF FIRST.ID THEN
DO I = 1 TO 1;
DO J = 1 TO 2;
CS[I,J] = .;
END;
END;
CS[ID, ItemCode] = ResponseCode;
IF LAST.ID THEN OUTPUT;
KEEP ID CS1-CS;
RUN;
Try the following and see if that works. I added an ID for testing.
data have;
input ID Item Response;
datalines;
1 200 10000
1 220 500
1 510 600000
1 400 750000
1 230 34000
2 330 25
2 400 47
2 600 62
2 788 1000
2 900 500000
2 9910 10010
;
proc transpose data = have out = want prefix = item_;
by id;
var response;
id item;
run;
Can you provide a small sample of your data? Based on your description, proc transpose should do what want. That is:
Id code
1 A
1 B
1 C
Could easily become:
Id Code1 Code2 Code3
1 A B C
Simple example:
data have;
do id = 1 to 3;
do item_code = 'A', 'B', 'C';
output;
end;
end;
run;
proc transpose data = have out = want prefix = code;
by id;
var item_code;
run;
ID | Item | Response |
1 | 200 | 10000 |
1 | 220 | 500 |
1 | 510 | 600000 |
1 | 400 | 750000 |
1 | 230 | 34000 |
1 | 330 | 25 |
1 | 400 | 47 |
1 | 600 | 62 |
1 | 788 | 1000 |
1 | 900 | 500000 |
1 | 9910 | 10010 |
I have something like this but I need a new dataset in this format:
ID | 200 | 220 | 510 | 400 | 230 | 330 | 400 | 600 | 788 | 900 | 9910 |
1 | 10000 | 500 | 600000 | 750000 | 34000 | 25 | 47 | 62 | 1000 | 500000 | 10010 |
2 | 11000 | 550 | 660000 | 825000 | 37400 | 28 | 52 | 68 | 1100 | 550000 | 11011 |
3 | 12100 | 605 | 726000 | 907500 | 41140 | 30 | 57 | 75 | 1210 | 605000 | 12112 |
4 | 13310 | 666 | 798600 | 998250 | 45254 | 33 | 63 | 83 | 1331 | 665500 | 13323 |
5 | 14641 | 732 | 878460 | 1098075 | 49779 | 37 | 69 | 91 | 1464 | 732050 | 14656 |
6 | 16105 | 805 | 966306 | 1207883 | 54757 | 40 | 76 | 100 | 1611 | 805255 | 16121 |
7 | 17716 | 886 | 1062937 | 1328671 | 60233 | 44 | 83 | 110 | 1772 | 885781 | 17733 |
8 | 19487 | 974 | 1169230 | 1461538 | 66256 | 49 | 92 | 121 | 1949 | 974359 | 19507 |
9 | 21436 | 1072 | 1286153 | 1607692 | 72882 | 54 | 101 | 133 | 2144 | 1071794 | 21457 |
10 | 23579 | 1179 | 1414769 | 1768461 | 80170 | 59 | 111 | 146 | 2358 | 1178974 | 23603 |
One trouble is that ItemCodes vary by ID. There are 135 possible ItemCodes from 220-9912. Sometimes when I read in the datafile there maybe 5 IDS next time maybe 60.
Try the following and see if that works. I added an ID for testing.
data have;
input ID Item Response;
datalines;
1 200 10000
1 220 500
1 510 600000
1 400 750000
1 230 34000
2 330 25
2 400 47
2 600 62
2 788 1000
2 900 500000
2 9910 10010
;
proc transpose data = have out = want prefix = item_;
by id;
var response;
id item;
run;
Note that you cannot (should not?) have just the numbers as the variable names in the transpose, so in the example I've added "item_" as a prefix.
The long format is better for storing the data. How are you going to handle the multiple records, ie ID 1 Item 400 is repeated twice in this list. Do you need to summarize this first somehow?
Now that I've said that, this is a straight PROC TRANSPOSE procedure if you want a SAS dataset. Especially if the data is dynamic.
You may need to sort the dataset by ID first.
proc transpose data=have out=want prefix=Item_;
by ID;
id Item;
var response;
run;
if you want a report - HTML, PDF or EXCEL PROC TABULATE is actually probably a better procedure, especially if you also need to summarize, since it will do that in one step.
What is the count of distinct item code values? That is how many columns you'll end up with in the transposed data.
Variation by ID is not an issue as long at there are not duplicates within an ID.
Thank you. The Proc Transpose worked. I do not need to summarize the ItemCode 400 twice was a mistake in my example. There will never be duplicate itemcodes.
@overky Please mark @collinelliot solution as correct then 🙂
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.