DATA Step, Macro, Functions and more

Creating user defined in-formats - in a two way lookup.

Accepted Solution Solved
Reply
Contributor
Posts: 65
Accepted Solution

Creating user defined in-formats - in a two way lookup.

Hi Everyone,

 

Below are the codes for numeric type datalines.

 

data exposure;
	retain Type 'I' Hlo 'U';

	do Year=1944 to 1949;
		Fmtname=cats('exp', Year, 'fmt');

		do Start='A', 'B', 'C', 'D', 'E';
			End=Start;
			input Label : $3. @;
			output;
		end;
	end;
	drop Year;
datalines;
220 180 210 110 90
202 170 208 100 85
150 110 150 60 50
105 56 88 40 30
60 30 40 20 10
45 22 22 10 8
;

title "Creating the Exposure InFormat";
proc format cntlin=exposure fmtlib;
run;

data read_exp;
	input Worker $ Year JobCode $;
	Exposure = inputn(JobCode,cats('exp',Year,'fmt8.'));
datalines;
001 1944 B
002 1948 E
003 1947 C
005 1945 A
006 1948 d
;

Can anyone assist me if I have datalines a mixture of both numeric and char: (use the following datalines).

 

datalines;
Greater than 210 180 210 110 90
202 170 208 100 .
150 110 greater than 210 60 50
105 . 88 40 Less than 10
60 30 40 20 Less than 10
45 22 22 . Less than 10
;

 

Where, Greater than 210 is one value (character).  For simplicity if required you can replace it to Greater_210

similarly, for less than 10.

And, . denotes numeric missing.

 

However, it might be easy if we proceed by considering all values (in datalines) as character.

But, my concern is can we do for mixture of both type values.  If yes, then please give me codes.

 

Regards,
AG_Stats

Accepted Solutions
Solution
‎05-16-2016 07:58 AM
Trusted Advisor
Posts: 1,117

Re: Creating user defined in-formats - in a two way lookup.

Thanks for the image, which is not a huge help, though, because it essentially reflects the information contained in your existing dataset EXPOSURE. More interesting would be the corresponding table of what you want to achieve.

 

Currently, your informats assign 'A', 'B', ..., 'E' to certain numbers. So, it would make sense indeed to add an "other" category (HLO='O') in order to assign "unknown" input values such as 'F' or missing to something like "Not found." However, as mentioned in my first reply, a label like "Not found" would change the type of the informat from numeric to character. To preserve the type of the existing informats you could assign a special missing value, e.g. .N, to indicate the same. The text label "Not found" could then be assigned through a format.

 

So, the question is: Would you like to use the numeric EXPOSURE values (220, 180, etc.) obtained with your informats as numbers (e.g. perform calculations with them)? If so, I suggest to insert the following four lines of code between the two END statements in your first data step:

  hlo='O';
  label='.N';
  output;
  hlo='U';

I'm not quite sure how to interpret your suggestions involving HLO='H' and HLO='L', though. With HLO='H' you could define a label for an alphabetical range of letters such as 'F', 'G', ... But it seems odd to me that all these should be assigned the label "Greater_220." Similarly, HLO='L' could be used to define a label for characters which precede 'A' in the ASCII collating sequence (including digits or a blank). But why should, e.g., '9' be labeled 'Less_8'?

View solution in original post


All Replies
Super User
Posts: 11,343

Re: Creating user defined in-formats - in a two way lookup.

Please be more specific. If you are attempting to use a data set as shown below the posted code to build a Cntlin data set please show the equivalent Format value or invalue bits you would explect.

It is not obvious what your "greater than" or "less than" are to be compared to.

For example you have

Greater than 210 180 210 110 90

 do you want something like this?

210< - high ='A' 

180 = 'B'

210 = 'C'

110 = 'D'

90 = 'E'

 

Since you apparently have an abitrary location for the range indicator your input data set will have to be a bit more complicated so that

each "value" also has a space for a rule. It may be doable if you have something like:

>, 210,=, 180,=, 210,=,110,=, 90 and read the "rule" as character and then use If statements to assign the correct start, end and comparisons

 

And are you ever going to want a range such as 20 - 30 = 'A'?

 

 

Trusted Advisor
Posts: 1,117

Re: Creating user defined in-formats - in a two way lookup.

Hi @AG_Stats,

 

I take it that you just want to create a new set of informats which read single letters (A - E) as values such as 202 or strings like "Greater than 210". Of course, this is possible. (Reading such datalines would be facilitated by separating the different values by two blanks or non-blank delimiters.) However, these new informats would be character informats, not numeric informats as the others. As a consequence, variable EXPOSURE would become a character variable, which might not be ideal for your purposes. To avoid this, one could replace the "Greater than ..." and "Less than ..." labels by numbers (or special missing values if this was more appropriate) and then apply a format to display these particular numbers (or special missing values, resp.) as "Greater than ..." etc.

Contributor
Posts: 65

Re: Creating user defined in-formats - in a two way lookup.

[ Edited ]

Hi,

 

Please find an image below which might help you: 

 

Untitled.png

 

The codes that I have given in my first post (question) are working well.

Now, again consider my question as I explained below (forgot my previous datalines given below codes - where I have mestioned greater than 200 etc.):

 

Only I want to add some more categories:-

 

Start = '  ';
End= Start;
Hlo = 'o'; /* Or you can change it if required & can add type= I or C also */
Label = 'Not Found';

 

Start = '  ';
End= Start;
Hlo = 'H'; /* Or you can change it if required & can add type= I or C also */
Label = 'Greater_220';

 

Start = '  ';
End= Start;
Hlo = 'L'; /* Or you can change it if required & can add type= I or C also */
Label = 'Less_8';

 

Regards,
AG_Stats
Solution
‎05-16-2016 07:58 AM
Trusted Advisor
Posts: 1,117

Re: Creating user defined in-formats - in a two way lookup.

Thanks for the image, which is not a huge help, though, because it essentially reflects the information contained in your existing dataset EXPOSURE. More interesting would be the corresponding table of what you want to achieve.

 

Currently, your informats assign 'A', 'B', ..., 'E' to certain numbers. So, it would make sense indeed to add an "other" category (HLO='O') in order to assign "unknown" input values such as 'F' or missing to something like "Not found." However, as mentioned in my first reply, a label like "Not found" would change the type of the informat from numeric to character. To preserve the type of the existing informats you could assign a special missing value, e.g. .N, to indicate the same. The text label "Not found" could then be assigned through a format.

 

So, the question is: Would you like to use the numeric EXPOSURE values (220, 180, etc.) obtained with your informats as numbers (e.g. perform calculations with them)? If so, I suggest to insert the following four lines of code between the two END statements in your first data step:

  hlo='O';
  label='.N';
  output;
  hlo='U';

I'm not quite sure how to interpret your suggestions involving HLO='H' and HLO='L', though. With HLO='H' you could define a label for an alphabetical range of letters such as 'F', 'G', ... But it seems odd to me that all these should be assigned the label "Greater_220." Similarly, HLO='L' could be used to define a label for characters which precede 'A' in the ASCII collating sequence (including digits or a blank). But why should, e.g., '9' be labeled 'Less_8'?

Contributor
Posts: 65

Re: Creating user defined in-formats - in a two way lookup.

Posted in reply to FreelanceReinhard

Quite useful, Thanks!

Regards,
AG_Stats
☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 5 replies
  • 258 views
  • 3 likes
  • 3 in conversation