Solved: Re: Using arrays

overky · Posted 04-04-2017 06:31 PM

I am reading in data to create X dataset. The raw data comes as a seperate observation/line for each ItemCode. I am wanting to combine the IDs into one Observation and have multiple ItemCodes strung out horizontally instread of verticle. I have tried a proc transpose but that does not combine IDs into one observation.

proc sort data=Raw nodup;
by ID ItemCode;
run;

data Raw;
set Raw;
counter=_N_;
_N_=1;
run;

PROC SORT DATA=Raw;
BY ID ItemCode;
RUN;
DATA NewRaw1;
ARRAY ??? (Unknown number of observations)
RETAIN ?
BY ID;
IF FIRST.ID THEN
DO I = 1 TO 1;
DO J = 1 TO 2;
CS[I,J] = .;
END;
END;
CS[ID, ItemCode] = ResponseCode;
IF LAST.ID THEN OUTPUT;
KEEP ID CS1-CS;
RUN;

collinelliot · Posted 04-04-2017 07:09 PM

Try the following and see if that works. I added an ID for testing.

data have;
input ID Item Response;
datalines;
1 200 10000
1 220 500
1 510 600000
1 400 750000
1 230 34000
2 330 25
2 400 47
2 600 62
2 788 1000
2 900 500000
2 9910 10010
;

proc transpose data = have out = want prefix = item_;
    by id;
    var response;
    id item;
run;

View solution in original post

collinelliot · Posted 04-04-2017 06:38 PM

Can you provide a small sample of your data? Based on your description, proc transpose should do what want. That is:

Id code

1 A

1 B

1 C

Could easily become:

Id Code1 Code2 Code3

1 A B C

collinelliot · Posted 04-04-2017 06:40 PM

Simple example:

data have;
    do id = 1 to 3;
        do item_code = 'A', 'B', 'C';
        output;
        end;
    end;
run;

proc transpose data = have out = want prefix = code;
    by id;
    var item_code;
run;

overky · Posted 04-04-2017 06:56 PM

ID	Item	Response
1	200	10000
1	220	500
1	510	600000
1	400	750000
1	230	34000
1	330	25
1	400	47
1	600	62
1	788	1000
1	900	500000
1	9910	10010

I have something like this but I need a new dataset in this format:

ID	200	220	510	400	230	330	400	600	788	900	9910
1	10000	500	600000	750000	34000	25	47	62	1000	500000	10010
2	11000	550	660000	825000	37400	28	52	68	1100	550000	11011
3	12100	605	726000	907500	41140	30	57	75	1210	605000	12112
4	13310	666	798600	998250	45254	33	63	83	1331	665500	13323
5	14641	732	878460	1098075	49779	37	69	91	1464	732050	14656
6	16105	805	966306	1207883	54757	40	76	100	1611	805255	16121
7	17716	886	1062937	1328671	60233	44	83	110	1772	885781	17733
8	19487	974	1169230	1461538	66256	49	92	121	1949	974359	19507
9	21436	1072	1286153	1607692	72882	54	101	133	2144	1071794	21457
10	23579	1179	1414769	1768461	80170	59	111	146	2358	1178974	23603

One trouble is that ItemCodes vary by ID. There are 135 possible ItemCodes from 220-9912. Sometimes when I read in the datafile there maybe 5 IDS next time maybe 60.

collinelliot · Posted 04-04-2017 07:09 PM

Try the following and see if that works. I added an ID for testing.

data have;
input ID Item Response;
datalines;
1 200 10000
1 220 500
1 510 600000
1 400 750000
1 230 34000
2 330 25
2 400 47
2 600 62
2 788 1000
2 900 500000
2 9910 10010
;

proc transpose data = have out = want prefix = item_;
    by id;
    var response;
    id item;
run;

collinelliot · Posted 04-04-2017 07:10 PM

Note that you cannot (should not?) have just the numbers as the variable names in the transpose, so in the example I've added "item_" as a prefix.

Reeza · Posted 04-04-2017 07:16 PM

The long format is better for storing the data. How are you going to handle the multiple records, ie ID 1 Item 400 is repeated twice in this list. Do you need to summarize this first somehow?

Now that I've said that, this is a straight PROC TRANSPOSE procedure if you want a SAS dataset. Especially if the data is dynamic.

You may need to sort the dataset by ID first.

proc transpose data=have out=want prefix=Item_;
by ID;
id Item;
var response;
run;

if you want a report - HTML, PDF or EXCEL PROC TABULATE is actually probably a better procedure, especially if you also need to summarize, since it will do that in one step.

collinelliot · Posted 04-04-2017 07:26 PM

What is the count of distinct item code values? That is how many columns you'll end up with in the transposed data.

Variation by ID is not an issue as long at there are not duplicates within an ID.

overky · Posted 04-04-2017 07:32 PM

Thank you. The Proc Transpose worked. I do not need to summarize the ItemCode 400 twice was a mistake in my example. There will never be duplicate itemcodes.

Reeza · Posted 04-04-2017 08:10 PM

@overky Please mark @collinelliot solution as correct then 🙂

Classroom Training Available!