-
ANSIOEM: Character set conversion or marking in ANSI (Windows), OEM (DOS) or UTF-8 files
Access this help text as a web page: ANSIOEM
Presentation and options
Because of the high diversity of characters existing in the different languages of the world, of the necessity to write down special symbols, and also due to the own computer science history, different sets of characters have been used in different countries and under different operating systems. Consequently, it is usual to find strange characters, for example when reading with a plain text editor under a Windows environment (such as notepad), or a text created with a plain text editor from another operating system, such as EDIT MS-DOS previous editor.
This problem can also be found in databases, where there are text fields that have been created using one of the several available character sets. In particular, when working with DBF format databases, there is an internal mark that, if it is activated (value greater than zero), it indicates with which set of characters has the text been written, so it is possible to interpretate it correctly.
This program has some options:
ANSI-OEM-UTF8 conversion:
Converts file in plain text or in DBF format between ANSI (Windows), OEM-850 (DOS) and/or UTF-8 character sets. This way, they can be visualized and edited correctly either with Windows (notepad) or MS-DOS editors (as EDIT).
ANSI-OEM-UTF8 Mark:
In the case of DBF text fields, the program allows to mark the file as OEM-850 (DOS), ANSI, UTF-8 or others character sets. This is especially useful when working with files with no specific mark (that is, marked as 0), usual in files created with former dBASE versions or with software that does not specify any particular character set. However, it must be kept in mind that marking a DBF table as belonging to a particular character set does NOT mean translating it. It is only a clue for knowing how it has to be interpreted.
Many characters are convertible between different sets; for example either the OEM-850 set from MS-DOS, the ANSI 1252 and UTF-8 set from Windows have both all upper-case and lower-case accentuated characters. However, some characters only exist in one of the character sets; for example the Trade Mark symbol (TM), that in the ANSI 1252 set can be represented as a single character, has no representation in the OEM-850 set. In these last cases, conversion is not possible and an arbitrary substitute character is needed. This character will show us those characters that have no representation in the output character set.
Warning: Do not convert a file that is not a plain text or a DBF; otherwise its contents will be corrupted. Examples of plain text files are MiraMon's REL and MMM, TXT files, BAT files, etc.
Notes:
- ANSI sets: The program uses the Windows-1252 character set, that correspond to Western Europe and the United States of America. In case of DBF files, the program marks them with value 88 (0x58), that in dBASE correspond to 'WEurope ANSI'.
Some possible marks for ANSI in DBF tables are:
'ascii ANSI' 87 (0x57) (dBASE)
'WEurope ANSI' 88 (0x58) (dBASE) [program default]
'Spanish ANSI' 89 (0x59) (dBASE)
'FoxPro ANSI' 3 (0x03) (FoxPro)
To mark a table in agreement with one of these sets the decimal numerical value has to be indicated (for example 3 for FoxPro ANSI). MiraMon and MiraDades support all these marks.
- OEM sets: The program uses the DOS OEM-850 character set, also called "Latin 1" and widely used in many countries of the world because of its richness in accentuated characters. In case of DBF files, the program marks them with the value 20 (0x14), that in dBASE corresponds to 'dbSPANISH2 dBASEESPcp850'.
Some possible marks for OEM-850 DBF tables are:
'dbDUTCH2 dBASENLDcp850' 10 (0x0A) (dBASE)
'dbFRENCH2 dBASEFRAcp850' 14 (0x0E) (dBASE)
'dbFRENCHCAN2 dBASEFRCcp850' 29 (0x1D) (dBASE)
'dbGERMAN2 dBASEDEUcp850' 16 (0x10) (dBASE)
'dbITALIAN2 dBASEITAcp850' 18 (0x12) (dBASE)
'dbPORTUGUESE2 dBASEPTBcp850' 37 (0x25) (dBASE)
'dbSPANISH2 dBASEESPcp850' 20 (0x14) (dBASE) [default]
'dbSWEDISH2 dBASESVEcp850' 22 (0x16) (dBASE)
'dbUK2 dBASEENGcp850' 26 (0x1A) (dBASE)
'dbUS2 dBASEENUcp850' 55 (0x37) (dBASE)
'FoxPro OEM-850' 2 (0x02) (FoxPro)
To mark a table in agreement with one of these sets the decimal numerical value has to be indicated (for example 14 for FRENCH2). MiraMon and MiraDades support all these marks.
- UTF-8 character set: The program uses the UTF-8 character set (8-bit Unicode Transformation Format), widely used in many countries around the world because it can represent any character Unicode (which defines each character or symbol by a name and numeric identifier). In the case of DBF files, the program generates an attached file that informs that the DBF is encrypted in this character set.
- When a new set of characters is used to mark a file, the program informs on the previous value.
- To know which value has the mark in a DBF file you can use the Information menu of MiraDades.
Dialog box of the application
Examples
ANSIOEM C:\COPIES\README.TXT D:\README.TXT 1
ANSIOEM C:\COPIES\README.TXT D:\README.TXT 1
ANSIOEM C:\BASES\COUNTRY.DBF A:\COUNTRY 3
ANSIOEM C:\BASES\MUNICIP ANSI
ANSIOEM C:\BASES\MUNICIP 5
ANSIOEM C:\BASES\COMAR.DBF A:\COMAR_EN_UTF8.DBF 5
|
Examples of ANSIOEM. |
Syntax
Syntax:
- ANSIOEM InputFile OutputFile Mode [Substitute]
- ANSIOEM DBFFile CharSet
Parameters:
- InputFile
(Input File -
Input parameter): Is the name of input file with the extension. In DBF extensions, only 'C' fields are translated. Other extensions are treated as text files and are fully translated.
- OutputFile
(Output File -
Output parameter): Is the name of output file. The program adds the extension if it is omitted.
- Mode (Input parameter):
- 0 ANSI -> OEM-850 (Borland conversion).
- 1 ANSI -> OEM-850 (Semi-Strict adaptation X. Pons - Recommended).
- 2 ANSI -> OEM-850 (Strict adaptation X. Pons).
- 3 OEM-850 -> ANSI (Borland conversion).
- 4 ANSI -> UTF-8.
- 5 OEM-850 (Borland conversion) -> UTF-8.
- 6 UTF-8 -> ANSI.
- 7 UTF-8 -> OEM-850 (Borland conversion).
- Substitute
(Substitute -
Input parameter): Character used to fill non convertible characters (for ex.: "X", "_"). Only valid in modes 1 and 2. If it is not specified, it will be requested.
- DBFFile
(DBF file -
Input parameter): The name of DBF file. Only 'C' fields are translated. Other extensions are treated as text files and are fully translated.
- CharSet
(Character Set -
Input parameter): Character set with which you want to mark the file:
- ANSI
- OEM-850
- UTF-8
- any decimal value [0,255]
Borland, dBASE, DOS, FoxPro and Windows are registered trademarks that belong to its respective owners.