I am reading in data to create X dataset. The raw data comes as a seperate observation/line for each ItemCode. I am wanting to combine the IDs into one Observation and have multiple ItemCodes strung out horizontally instread of verticle. I have tried a proc transpose but that does not combine IDs into one observation.
proc sort data=Raw nodup;
by ID ItemCode;
run;
data Raw;
set Raw;
counter=_N_;
_N_=1;
run;
PROC SORT DATA=Raw;
BY ID ItemCode;
RUN;
DATA NewRaw1;
ARRAY ??? (Unknown number of observations)
RETAIN ?
BY ID;
IF FIRST.ID THEN
DO I = 1 TO 1;
DO J = 1 TO 2;
CS[I,J] = .;
END;
END;
CS[ID, ItemCode] = ResponseCode;
IF LAST.ID THEN OUTPUT;
KEEP ID CS1-CS;
RUN;
Try the following and see if that works. I added an ID for testing.
data have;
input ID Item Response;
datalines;
1 200 10000
1 220 500
1 510 600000
1 400 750000
1 230 34000
2 330 25
2 400 47
2 600 62
2 788 1000
2 900 500000
2 9910 10010
;
proc transpose data = have out = want prefix = item_;
by id;
var response;
id item;
run;
Can you provide a small sample of your data? Based on your description, proc transpose should do what want. That is:
Id code
1 A
1 B
1 C
Could easily become:
Id Code1 Code2 Code3
1 A B C
Simple example:
data have;
do id = 1 to 3;
do item_code = 'A', 'B', 'C';
output;
end;
end;
run;
proc transpose data = have out = want prefix = code;
by id;
var item_code;
run;
ID | Item | Response |
1 | 200 | 10000 |
1 | 220 | 500 |
1 | 510 | 600000 |
1 | 400 | 750000 |
1 | 230 | 34000 |
1 | 330 | 25 |
1 | 400 | 47 |
1 | 600 | 62 |
1 | 788 | 1000 |
1 | 900 | 500000 |
1 | 9910 | 10010 |
I have something like this but I need a new dataset in this format:
ID | 200 | 220 | 510 | 400 | 230 | 330 | 400 | 600 | 788 | 900 | 9910 |
1 | 10000 | 500 | 600000 | 750000 | 34000 | 25 | 47 | 62 | 1000 | 500000 | 10010 |
2 | 11000 | 550 | 660000 | 825000 | 37400 | 28 | 52 | 68 | 1100 | 550000 | 11011 |
3 | 12100 | 605 | 726000 | 907500 | 41140 | 30 | 57 | 75 | 1210 | 605000 | 12112 |
4 | 13310 | 666 | 798600 | 998250 | 45254 | 33 | 63 | 83 | 1331 | 665500 | 13323 |
5 | 14641 | 732 | 878460 | 1098075 | 49779 | 37 | 69 | 91 | 1464 | 732050 | 14656 |
6 | 16105 | 805 | 966306 | 1207883 | 54757 | 40 | 76 | 100 | 1611 | 805255 | 16121 |
7 | 17716 | 886 | 1062937 | 1328671 | 60233 | 44 | 83 | 110 | 1772 | 885781 | 17733 |
8 | 19487 | 974 | 1169230 | 1461538 | 66256 | 49 | 92 | 121 | 1949 | 974359 | 19507 |
9 | 21436 | 1072 | 1286153 | 1607692 | 72882 | 54 | 101 | 133 | 2144 | 1071794 | 21457 |
10 | 23579 | 1179 | 1414769 | 1768461 | 80170 | 59 | 111 | 146 | 2358 | 1178974 | 23603 |
One trouble is that ItemCodes vary by ID. There are 135 possible ItemCodes from 220-9912. Sometimes when I read in the datafile there maybe 5 IDS next time maybe 60.
Try the following and see if that works. I added an ID for testing.
data have;
input ID Item Response;
datalines;
1 200 10000
1 220 500
1 510 600000
1 400 750000
1 230 34000
2 330 25
2 400 47
2 600 62
2 788 1000
2 900 500000
2 9910 10010
;
proc transpose data = have out = want prefix = item_;
by id;
var response;
id item;
run;
Note that you cannot (should not?) have just the numbers as the variable names in the transpose, so in the example I've added "item_" as a prefix.
The long format is better for storing the data. How are you going to handle the multiple records, ie ID 1 Item 400 is repeated twice in this list. Do you need to summarize this first somehow?
Now that I've said that, this is a straight PROC TRANSPOSE procedure if you want a SAS dataset. Especially if the data is dynamic.
You may need to sort the dataset by ID first.
proc transpose data=have out=want prefix=Item_;
by ID;
id Item;
var response;
run;
if you want a report - HTML, PDF or EXCEL PROC TABULATE is actually probably a better procedure, especially if you also need to summarize, since it will do that in one step.
What is the count of distinct item code values? That is how many columns you'll end up with in the transposed data.
Variation by ID is not an issue as long at there are not duplicates within an ID.
Thank you. The Proc Transpose worked. I do not need to summarize the ItemCode 400 twice was a mistake in my example. There will never be duplicate itemcodes.
@overky Please mark @collinelliot solution as correct then 🙂
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.