Using arrays

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 19
Accepted Solution

Using arrays

I am reading in data to create X dataset.  The raw data comes as a seperate observation/line for each ItemCode.  I am wanting to combine the IDs into one Observation and have multiple ItemCodes strung out horizontally instread of verticle.  I have tried a proc transpose but that does not combine IDs into one observation.

 

proc sort data=Raw nodup;
by ID ItemCode;
run;

data Raw;
set Raw;
counter=_N_;
_N_=1;
run;

PROC SORT DATA=Raw;
BY ID ItemCode;
RUN;
DATA NewRaw1;
ARRAY ??? (Unknown number of observations)
RETAIN ?
BY ID;
IF FIRST.ID THEN
DO I = 1 TO 1;
DO J = 1 TO 2;
CS[I,J] = .;
END;
END;
CS[ID, ItemCode] = ResponseCode;
IF LAST.ID THEN OUTPUT;
KEEP ID CS1-CS;
RUN;


Accepted Solutions
Solution
‎04-10-2017 10:40 AM
PROC Star
Posts: 295

Re: Using arrays

Try the following and see if that works. I added an ID for testing.

 

data have;
input ID Item Response;
datalines;
1 200 10000
1 220 500
1 510 600000
1 400 750000
1 230 34000
2 330 25
2 400 47
2 600 62
2 788 1000
2 900 500000
2 9910 10010
;

proc transpose data = have out = want prefix = item_;
    by id;
    var response;
    id item;
run;

View solution in original post


All Replies
PROC Star
Posts: 295

Re: Using arrays

Can you provide a small sample of your data? Based on your description, proc transpose should do what want. That is:

 

Id code

1 A

1 B

1 C

 

Could easily become:

Id Code1 Code2 Code3

1 A         B           C

 

 

PROC Star
Posts: 295

Re: Using arrays

Simple example:

 

data have;
    do id = 1 to 3;
        do item_code = 'A', 'B', 'C';
        output;
        end;
    end;
run;

proc transpose data = have out = want prefix = code;
    by id;
    var item_code;
run;
Occasional Contributor
Posts: 19

Re: Using arrays

IDItemResponse
120010000
1220500
1510600000
1400750000
123034000
133025
140047
160062
17881000
1900500000
1991010010

 

I have something like this but I need a new dataset in this format:

 

ID2002205104002303304006007889009910
11000050060000075000034000254762100050000010010
21100055066000082500037400285268110055000011011
31210060572600090750041140305775121060500012112
41331066679860099825045254336383133166550013323
514641732878460109807549779376991146473205014656
6161058059663061207883547574076100161180525516121
71771688610629371328671602334483110177288578117733
81948797411692301461538662564992121194997435919507
92143610721286153160769272882541011332144107179421457
102357911791414769176846180170591111462358117897423603

 

One trouble is that ItemCodes vary by ID.  There are 135 possible ItemCodes from 220-9912.  Sometimes when I read in the datafile there maybe 5 IDS next time maybe 60.

 

 

Solution
‎04-10-2017 10:40 AM
PROC Star
Posts: 295

Re: Using arrays

Try the following and see if that works. I added an ID for testing.

 

data have;
input ID Item Response;
datalines;
1 200 10000
1 220 500
1 510 600000
1 400 750000
1 230 34000
2 330 25
2 400 47
2 600 62
2 788 1000
2 900 500000
2 9910 10010
;

proc transpose data = have out = want prefix = item_;
    by id;
    var response;
    id item;
run;
PROC Star
Posts: 295

Re: Using arrays

Note that you cannot (should not?) have just the numbers as the variable names in the transpose, so in the example I've added "item_" as a prefix. 

Super User
Posts: 19,194

Re: Using arrays

The long format is better for storing the data. How are you going to handle the multiple records, ie ID 1 Item 400 is repeated twice in this list. Do you need to summarize this first somehow?

 

Now that I've said that, this is a straight PROC TRANSPOSE procedure if you want a SAS dataset. Especially if the data is dynamic. 

You may need to sort the dataset by ID first. 

 

 

proc transpose data=have out=want prefix=Item_;
by ID;
id Item;
var response;
run;

if you want a report - HTML, PDF or EXCEL PROC TABULATE is actually probably a better procedure, especially if you also need to summarize, since it will do that in one step.

PROC Star
Posts: 295

Re: Using arrays

What is the count of distinct item code values? That is how many columns you'll end up with in the transposed data.

 

Variation by ID is not an issue as long at there are not duplicates within an ID.

Occasional Contributor
Posts: 19

Re: Using arrays

Thank you.  The Proc Transpose worked.  I do not need to summarize the ItemCode 400 twice was a mistake in my example.  There will never be duplicate itemcodes.

Super User
Posts: 19,194

Re: Using arrays

@overky Please mark @collinelliot solution as correct then Smiley Happy

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 9 replies
  • 187 views
  • 1 like
  • 3 in conversation