Windows batch script to parse CSV file and output a text file

Important update – I don’t think Windows batch is a good option for your needs because a single FOR /F cannot parse more than 31 tokens. See the bottom of the Addendum below for an explanation.

However, it is possible to do what you want with batch. This ugly code will give you access to all 64 tokens.

for /f "usebackq tokens=1-29* delims=," %%A in ("%filename%") do (
  for /f "tokens=1-26* delims=," %%a in ("%%^") do (
    for /f "tokens=1-9 delims=," %%1 in ("%%{") do (
      rem Tokens 1-26 are in variables %%A - %%Z
      rem Token  27 is in %%[
      rem Token  28 is in %%\
      rem Token  29 is in %%]
      rem Tokens 30-55 are in %%a - %%z
      rem Tokens 56-64 are in %%1 - %%9
    )
  )
)

The addendum provides important info on how the above works.

If you only need a few of the tokens spread out amongst the 64 on the line, then the solution is marginally easier in that you might be able to avoid using crazy characters as FOR variables. But there is still careful bookkeeping to be done.

For example, the following will give you access to tokens 5, 27, 46 and 64

for /f "usebackq tokens=5,27,30* delims=," %%A in ("%filename%") do (
  for /f "tokens=16,30* delims=," %%E in ("%%D") do (
    for /f "tokens=4 delims=," %%H in ("%%G") do (
      rem Token  5 is in %%A
      rem Token 27 is in %%B
      rem Token 46 is in %%E
      rem Token 64 is in %%H
    )
  )
)

April 2016 Update – Based on investigative work by DosTips users Aacini, penpen, and aGerman, I have developed a relatively easy method to simultaneously access thousands of tokens using FOR /F. The work is part of this DosTips thread. The actual code can be found in these 3 posts:

Original Answer
FOR variables are limited to a single character, so your %%BL strategy can’t work. The variables are case sensitive. According to Microsoft you are limited to capturing 26 tokens within one FOR statement, but it is possible to get more if you use more than just alpha. Its a pain because you need an ASCII table to figure out which characters go where. FOR does not allow just any character however, and the maximum number of tokens that a single FOR /F can assign is 31 +1. Any attempt to parse and assign more than 31 will quietly fail, as you have discovered.

Thankfully, I don’t think you need that many tokens. You simply specify which tokens you want with the TOKENS option.

for /f "usebackq tokens=7,12,15,18 delims=," %%A in ("%filename%") do echo %%A,%%B,%%C,%%D

will give you your 7th, 12th, 15th and 18th tokens.

Addendum

April 2016 Update A couple weeks ago I learned that the following rules (written 6 years ago) are code page dependent. The data below has been verified for code pages 437 and 850. More importantly, the FOR variable sequence of extended ASCII characters 128-254 does not match the byte code value, and varies tremendously by code page. It turns out the FOR /F variable mapping is based on the underlying UTF-(16?) code point. So the extended ASCII characters are of limited use when used with FOR /F. See the thread at http://www.dostips.com/forum/viewtopic.php?f=3&t=7703 for more information.

I performed some tests, and can report the following (updated in response to jeb’s comment):

Most characters can be used as a FOR variable, including extended ASCII 128-254. But some characters cannot be used to define a variable in the first part of a FOR statement, but can be used in the DO clause. A few can’t be used for either. Some have no restrictions, but require special syntax.

The following is a summary of characters that have restrictions or require special syntax. Note that text within angle brackets like <space> represents a single character.

Dec  Hex   Character   Define     Access
  0  0x00  <nul>       No         No
 09  0x09  <tab>       No         %%^<tab>  or  "%%<tab>"
 10  0x0A  <LF>        No         %%^<CR><LF><CR><LF>  or  %%^<LF><LF>
 11  0x0B  <VT>        No         %%<VT>
 12  0x0C  <FF>        No         %%<FF>
 13  0x0D  <CR>        No         No
 26  0x1A  <SUB>       %%%VAR%    %%%VAR% (%VAR% must be defined as <SUB>)
 32  0x20  <space>     No         %%^<space>  or  "%%<space>"
 34  0x22  "           %%^"       %%"  or  %%^"
 36  0x24  $           %%$        %%$ works, but %%~$ does not
 37  0x25  %           %%%%       %%~%%
 38  0x26  &           %%^&       %%^&  or  "%%&"
 41  0x29  )           %%^)       %%^)  or  "%%)"
 44  0x2C  ,           No         %%^,  or  "%%,"
 59  0x3B  ;           No         %%^;  or  "%%;"
 60  0x3C  <           %%^<       %%^<  or  "%%<"
 61  0x3D  =           No         %%^=  or  "%%="
 62  0x3E  >           %%^>       %%^>  or  "%%>"
 94  0x5E  ^           %%^^       %%^^  or  "%%^"
124  0x7C  |           %%^|       %%^|  or  "%%|"
126  0x7E  ~           %%~        %%~~ (%%~ may crash CMD.EXE if at end of line)
255  0xFF  <NB space>  No         No

Special characters like ^ < > | & must be either escaped or quoted. For example, the following works:

for /f %%^< in ("OK") do echo "%%<" %%^<

Some characters cannot be used to define a FOR variable. For example, the following gives a syntax error:

for /f %%^= in ("No can do") do echo anything

But %%= can be implicitly defined by using the TOKENS option, and the value accessed in the DO clause like so:

for /f "tokens=1-3" %%^< in ("A B C") do echo %%^< %%^= %%^>

The % is odd – You can define a FOR variable using %%%%. But The value cannot be accessed unless you use the ~ modifier. This means enclosing quotes cannot be preserved.

for /f "usebackq tokens=1,2" %%%% in ('"A"') do echo %%%% %%~%%

The above yields %% A

The ~ is a potentially dangerous FOR variable. If you attempt to access the variable using %%~ at the end of a line, you can get unpredictable results, and may even crash CMD.EXE! The only reliable way to access it without restrictions is to use %%~~, which of course strips any enclosing quotes.

for /f %%~ in ("A") do echo This can crash because its the end of line: %%~

for /f %%~ in ("A") do echo But this (%%~) should be safe

for /f %%~ in ("A") do echo This works even at end of line: %%~~

The <SUB> (0x1A) character is special because <SUB> literals embedded within batch scripts are read as linefeeds (<LF>). In order to use <SUB> as a FOR variable, the value must be somehow stored within an environment variable, and then %%%VAR% will work for both definition and access.

As already stated, a single FOR /F can parse and assign a maximum of 31 tokens. For example:

@echo off
setlocal enableDelayedExpansion
set "str="
for /l %%n in (1 1 35) do set "str=!str! %%n"
for /f "tokens=1-31" %%A in ("!str!") do echo A=%%A _=%%_

The above yields A=1 _=31 Note – tokens 2-30 work just fine, I just wanted a small example

Any attempt to parse and assign more than 31 tokens will silently fail without setting ERRORLEVEL.

@echo off
setlocal enableDelayedExpansion
set "str="
for /l %%n in (1 1 35) do set "str=!str! %%n"
for /f "tokens=1-32" %%A in ("!str!") do echo this example fails entirely

You can parse and assign up to 31 tokens and assign the remainder to another token as follows:

@echo off
setlocal enableDelayedExpansion
set "str="
for /l %%0 in (1 1 35) do set "str=!str! %%n"
for /f "tokens=1-31*" %%@ in ("!str!") do echo @=%%A  ^^=%%^^  _=%%_

The above yields @=1 ^=31 _=32 33 34 35

And now for the really bad news. A single FOR /F can never parse more than 31 tokens, as I learned when I looked at Number of tokens limit in a FOR command in a Windows batch script

@echo off
setlocal enableDelayedExpansion
set "str="
for /l %%n in (1 1 35) do set "str=!str! %%n"
for /f "tokens=1,31,32" %%A in ("!str!") do echo A=%%A  B=%%B  C=%%C

The very unfortunate output is A=1 B=31 C=%C

Leave a Comment