ANSIOEM: Character set conversion or marking in ANSI (Windows), OEM (DOS) or UTF-8 files

Direct access to online help: ANSIOEM

Access the application from the menu: "Tools | File maintenance | Convert or mark ANSI/OEM/UTF-8"

Presentation and options	Dialog box of the application
Examples	Syntax

Presentation and options

Because of the high diversity of characters existing in the different languages of the world, of the necessity to write down special symbols, and also due to the own computer science history, different sets of characters have been used in different countries and under different operating systems. Consequently, it is usual to find strange characters, for example when reading with a plain text editor under a Windows environment (such as notepad), or a text created with a plain text editor from another operating system, such as EDIT MS-DOS previous editor.

This problem can also be found in databases, where there are text fields that have been created using one of the several available character sets. In particular, when working with DBF format databases, there is an internal mark that, if it is activated (value greater than zero), it indicates with which set of characters has the text been written, so it is possible to interpretate it correctly.

This program has some options:

ANSI-OEM-UTF8 conversion

ANSI-OEM-UTF8 Mark

Many characters are convertible between different sets; for example either the OEM-850 set from MS-DOS, the ANSI 1252 and UTF-8 set from Windows have both all upper-case and lower-case accentuated characters. However, some characters only exist in one of the character sets; for example the Trade Mark symbol (TM), that in the ANSI 1252 set can be represented as a single character, has no representation in the OEM-850 set. In these last cases, conversion is not possible and an arbitrary substitute character is needed. This character will show us those characters that have no representation in the output character set.

Warning: Do not convert a file that is not a plain text or a DBF; otherwise its contents will be corrupted. Examples of plain text files are MiraMon's REL and MMM, TXT files, BAT files, etc.

Notes:

ANSI sets: The program uses the Windows-1252 character set, that correspond to Western Europe and the United States of America. In case of DBF files, the program marks them with value 88 (0x58), that in dBASE correspond to 'WEurope ANSI'.
Some possible marks for ANSI in DBF tables are:
```
         'ascii   ANSI'  87  (0x57)   (dBASE)
         'WEurope ANSI'  88  (0x58)   (dBASE)  [program default]
         'Spanish ANSI'  89  (0x59)   (dBASE)
         'FoxPro  ANSI'   3  (0x03)   (FoxPro)
```
To mark a table in agreement with one of these sets the decimal numerical value has to be indicated (for example 3 for FoxPro ANSI). MiraMon and MiraDades support all these marks.

OEM sets: The program uses the DOS OEM-850 character set, also called "Latin 1" and widely used in many countries of the world because of its richness in accentuated characters. In case of DBF files, the program marks them with the value 20 (0x14), that in dBASE corresponds to 'dbSPANISH2 dBASEESPcp850'.

Some possible marks for OEM-850 DBF tables are:

         'dbDUTCH2      dBASENLDcp850'  10  (0x0A)   (dBASE)
         'dbFRENCH2     dBASEFRAcp850'  14  (0x0E)   (dBASE)
         'dbFRENCHCAN2  dBASEFRCcp850'  29  (0x1D)   (dBASE)
         'dbGERMAN2     dBASEDEUcp850'  16  (0x10)   (dBASE)
         'dbITALIAN2    dBASEITAcp850'  18  (0x12)   (dBASE)
         'dbPORTUGUESE2 dBASEPTBcp850'  37  (0x25)   (dBASE)
         'dbSPANISH2    dBASEESPcp850'  20  (0x14)   (dBASE)  [default]
         'dbSWEDISH2    dBASESVEcp850'  22  (0x16)   (dBASE)
         'dbUK2         dBASEENGcp850'  26  (0x1A)   (dBASE)
         'dbUS2         dBASEENUcp850'  55  (0x37)   (dBASE)
         'FoxPro  OEM-850'               2  (0x02)   (FoxPro)

To mark a table in agreement with one of these sets the decimal numerical value has to be indicated (for example 14 for FRENCH2). MiraMon and MiraDades support all these marks.

UTF-8 character set: The program uses the UTF-8 character set (8-bit Unicode Transformation Format), widely used in many countries around the world because it can represent any character Unicode (which defines each character or symbol by a name and numeric identifier). In the case of DBF files, the program generates an attached file that informs that the DBF is encrypted in this character set.
When a new set of characters is used to mark a file, the program informs on the previous value.
To know which value has the mark in a DBF file you can use the Information menu of MiraDades.

Dialog box of the application

ANSIOEM dialog box.

Examples

   ANSIOEM   C:\COPIES\README.TXT   D:\README.TXT   1
   ANSIOEM   C:\COPIES\README.TXT   D:\README.TXT   1   
   ANSIOEM   C:\BASES\COUNTRY.DBF   A:\COUNTRY      3
   ANSIOEM   C:\BASES\MUNICIP       ANSI
   ANSIOEM   C:\BASES\MUNICIP       5
   ANSIOEM   C:\BASES\COMAR.DBF     A:\COMAR_EN_UTF8.DBF 5

Examples of ANSIOEM.

Syntax

Syntax:

ANSIOEM InputFile OutputFile Mode [Substitute]
ANSIOEM DBFFile CharSet

Parameters:

InputFile (Input File - Input parameter): Is the name of input file with the extension. In DBF extensions, only 'C' fields are translated. Other extensions are treated as text files and are fully translated.
OutputFile (Output File - Output parameter): Is the name of output file. The program adds the extension if it is omitted.
Mode (Input parameter):
- 0 ANSI -> OEM-850 (Borland conversion).
- 1 ANSI -> OEM-850 (Semi-Strict adaptation X. Pons - Recommended).
- 2 ANSI -> OEM-850 (Strict adaptation X. Pons).
- 3 OEM-850 -> ANSI (Borland conversion).
- 4 ANSI -> UTF-8.
- 5 OEM-850 (Borland conversion) -> UTF-8.
- 6 UTF-8 -> ANSI.
- 7 UTF-8 -> OEM-850 (Borland conversion).
Substitute (Substitute - Input parameter): Character used to fill non convertible characters (for ex.: "X", "_"). Only valid in modes 1 and 2. If it is not specified, it will be requested.

DBFFile (DBF file - Input parameter): The name of DBF file. Only 'C' fields are translated. Other extensions are treated as text files and are fully translated.
CharSet (Character Set - Input parameter): Character set with which you want to mark the file:
- ANSI
- OEM-850
- UTF-8
- any decimal value [0,255]

Borland, dBASE, DOS, FoxPro and Windows are registered trademarks that belong to its respective owners.