Help |
|
Address Parsing
[ Credit ]ObjectivesThe Address Parsing file is used to standardize the many ways in which street addresses may be written. The structure of the file is such that you can easily customize it to suit your own particular needs. This can be useful in situations where the addresses that you are geocoding contain a non-standard abbreviation or component order. If, for example, you find that the address number is often shown as appearing after the street instead of before the street (E.G. Main St. 123 instead of 123 Main St.), you can instruct the Atlas Geocoder to assume a different order to the address convention.
Where it is locatedThe street type translation file is named ADDPARSE.TRN and is located in the \AtlasGIS\Geocode folder on your hard drive. How to edit the translation fileThe ADDPARSE.TRN file can be edited using any text editor or word process. To avoid inadvertantly inserting any formatting characters into the file, a text editor such as Windows Notepad is generally preferred. After opening the file, simply find the section containing the direction you would like to add to. Each section includes comments that are proceeded with a slash ("/") character with instructions for that section. The ADDPARSE.TRN file contains six sections, each of which is used to control a different address parameter. The following are the major sections of the file: Section 1 - Address component ordering Section 1 is used to set the order that the address components most often appear in. There are 7 predefined options, as follows:
Where the Format codes are defined as: To modify the address order setting, simply replace the value in the appropriate section of the file with a "0" through "6", depending on the format of your file. Section 2 - Plural character for the street type in street intersections This section allows you define what character at the end of the street type indicates a plural. The default value is "s". This is most commonly used when street intersection addresses are indicated as in the following example: 1st and Main Streets. Section 3 - Numeric street names This section allows you to control under what circumstances the geocoder will interpret a number in the address as a house number versus a street name. Intuitively, we know that if a "3" is followed by the characters "rd", the reference is probably to 3rd St. rather than the 3 being part of the address number. The values listed in this section reflect those that are used to distinguish numeric street names from house numbers, if the values follow a number. The default values include: st (Example: 1st) rd (example: 3rd) nd (example: 2nd) th (example: 5th) To modify this section, add the appropriate characters on blank lines following the last value. This section would most likely be used in situations where the incorrect characters are used consistentl or systematically in a database (E.G. 3d instead of 3rd). Section 4 - Intersection conjunctions This section is used to specify how addresses consisting of street intersections are interpreted. The possible values consist of the different characters that can be used to separate the two street names making up the intersection (E.G. 1st and Main, 1st & Main, 1st/Main, etc.). To modify this section, add the appropriate characters on blank lines following the last value. This section would most likely be used in situations where the incorrect characters are used consistently or systematically in a database. The existing default values in the file include: @ Section 5 - Pre-strip tokens Pre-strip tokens are the characters in the address that are to be removed in addition to anything that follows. For example, if the string Apt. is specificied as a pre-strip token, the characters Apt. together with anything following (such as #115) will be removed. This section should be modified if your addresses often contain strings not listed below. Remember, that pre-strip tokens will be removed together with whatever follows them. If you have a specific string that you want to have removed but not following characters, list those characters in the Post-Strip Tokens section, instead. Default values consist of the following Apartment # Section 6 - Post-strip tokens Post-strip tokens are character strings that are removed without regard to anything that follows them. These could include items such as, "floor", "department", etc. The default values that are included in the file consist of: Floor This section should be modified if your addresses often contain strings not listed above.
|