BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
reini
Calcite | Level 5

Hello everybody,

 

I'm struggeling with some awfully structured data set that I would like to reorder in a reasonable way.

The data I got looks somewhat like this

 

data unstructured;                                                                         
    input subject $3. dep_code_1 $3. quantity_1 dep_code_2 $3. quantity_2 dep_code_3 $3. quantity_3;                          
    datalines;                                                                          
    s1 a1 12 a3 63 a2 17
    s2 a3 2 a1 11    .
    s3 a1 14 a3 7    .
    ;
run;

 

Now here I have some departments which are internally coded as a1 a2 and a3 and some subjects s1-s3. Now for each quantity there is a line where first you see the department code followed by the quantity associated. Now in each line there are only those departments listed that are concerned by the given subject.

(In addition there are further information about each subject per line, which I left out here).

 

As you can see this structure is kind of cumbersome and I's like to reformat it so somehwat like this

 

data structured;                                                                         
     input subject $3. quantity_a1 quantity_a2 quantity_a3;                          
    datalines;                                                                          
    s1 12 17 63
    s2 11 2 .
    s3 14 . 7
    ;
run;

 

Now I have the problem that e.g. I would need to read the value of dep_code_1 and of quantity_1 to do soemthing like

   cats("quantity_a",dep_code_1) = quantity_1

now of course this doesn't work, since cats("quantity_a",dep_code_1) does not define a proper variable name.

 

I also tried to use a do loop and symput like this:

 

data structured;

   set unstructured;

   do i = 1 to 3;

      call symput("category_name", "quantity_a"||dep_code_&i);

     &category_name = quantity_&i;

   end;

run;

 

but this again didn't work since the macro variable category_name seems not to be accesible while inside this data step.

 

Does anybody have an idea about how this can be done?

 

Thanks a lot in advance

1 ACCEPTED SOLUTION

Accepted Solutions
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Step 1 for me would be, as always, to get away from the Excel thinking.  Get the data normalised:

data unstructured;                                                                         
  input subject $3. dep_code_1 $3. quantity_1 dep_code_2 $3. quantity_2 dep_code_3 $3. quantity_3;                          
datalines;                                                                          
s1 a1 12 a3 63 a2 17
s2 a3 2 a1 11    .
s3 a1 14 a3 7    .
;
run;

data want (keep=subject department value);
  set unstructured;
  length department $10;
  array d{*} dep_:;
  array q{*} quantity_:;
  do i=1 to dim(d);
    department=d{i};
    value=q{i};
    if department ne "" then output;
  end;
run;

From there you can manipulate the data as its no longer part of the structure, but in the data part where it is supposed to be.

View solution in original post

4 REPLIES 4
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Step 1 for me would be, as always, to get away from the Excel thinking.  Get the data normalised:

data unstructured;                                                                         
  input subject $3. dep_code_1 $3. quantity_1 dep_code_2 $3. quantity_2 dep_code_3 $3. quantity_3;                          
datalines;                                                                          
s1 a1 12 a3 63 a2 17
s2 a3 2 a1 11    .
s3 a1 14 a3 7    .
;
run;

data want (keep=subject department value);
  set unstructured;
  length department $10;
  array d{*} dep_:;
  array q{*} quantity_:;
  do i=1 to dim(d);
    department=d{i};
    value=q{i};
    if department ne "" then output;
  end;
run;

From there you can manipulate the data as its no longer part of the structure, but in the data part where it is supposed to be.

reini
Calcite | Level 5

Well, what you said about Excel is quite a good advice I think, yet it's hard to change your own thought processes. Especially when they're established for a long time.

Anyway your code seems to work excellent and though it's not what I had in mind when I posed my question, I actually should be able to do any subsequent assignments with the data structured that way. So thanks a lot 🙂

Kurt_Bremser
Super User

Or use two transposes:

proc transpose data=unstructured out=int1 (drop=_name_ rename=(col1=department));
by subject;
var dep_code:;
run;

proc transpose data=unstructured out=int2 (drop=_name_ subject rename=(col1=quantity));
by subject;
var quantity:;
run;

data want;
merge
  int1
  int2
;
if department ne '';
run;
reini
Calcite | Level 5
Thanks a lot for this alternative approach 🙂

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 628 views
  • 1 like
  • 3 in conversation