Why am I getting X. in my column names when reading a data frame?

read.csv() is a wrapper around the more general read.table() function. That latter function has argument check.names which is documented as:

check.names: logical.  If ‘TRUE’ then the names of the variables in the
         data frame are checked to ensure that they are syntactically
         valid variable names.  If necessary they are adjusted (by
         ‘make.names’) so that they are, and also to ensure that there
         are no duplicates.

If your header contains labels that are not syntactically valid then make.names() will replace them with a valid name, based upon the invalid name, removing invalid characters and possibly prepending X:

R> make.names("$Foo")
[1] "X.Foo"

This is documented in ?make.names:

Details:

    A syntactically valid name consists of letters, numbers and the
    dot or underline characters and starts with a letter or the dot
    not followed by a number.  Names such as ‘".2way"’ are not valid,
    and neither are the reserved words.

    The definition of a _letter_ depends on the current locale, but
    only ASCII digits are considered to be digits.

    The character ‘"X"’ is prepended if necessary.  All invalid
    characters are translated to ‘"."’.  A missing value is translated
    to ‘"NA"’.  Names which match R keywords have a dot appended to
    them.  Duplicated values are altered by ‘make.unique’.

The behaviour you are seeing is entirely consistent with the documented way read.table() loads in your data. That would suggest that you have syntactically invalid labels in the header row of your CSV file. Note the point above from ?make.names that what is a letter depends on the locale of your system; The CSV file might include a valid character that your text editor will display but if R is not running in the same locale that character may not be valid there, for example?

I would look at the CSV file and identify any non-ASCII characters in the header line; there are possibly non-visible characters (or escape sequences; \t?) in the header row also. A lot may be going on between reading in the file with the non-valid names and displaying it in the console which might be masking the non-valid characters, so don’t take the fact that it doesn’t show anything wrong without check.names as indicating that the file is OK.

Posting the output of sessionInfo() would also be useful.

Leave a Comment