Import most recent csv file to sql server in ssis

The code from @garry Vass, or one like it, is going to be needed even if you’re using SSIS as your import tool.

Within SSIS, you will need to update the connection string to your flat file connection manager to point to the new file. Ergo, you need to determine what is the most recent file.

Finding the most recent file

Whether you do it by file attributes (Garry’s code) or slicing and dicing of file names is going to be dependent upon what your business rules are. Is it always the most recently modified file (attribute) or does it need to be based off the file name being interpreted as a sequence. This matters if the test_01112012_120122.csv had a mistake in it and the contents are updated. The modified date will change but the file name will not and those changes wouldn’t get ported back into the database.

I would suggest you create 2 variables of type String and scoped to the package named RootFolder and CurrentFile. Optionally, you can create one called FileMask if you are restricting to a particular type like *.csv. RootFolder would be the base folder you expect to find files in C:\ssisdata\MyProject. CurrentFile will be assigned a value from a script of the fully qualified path to the most recently modified file. I find it helpful at this point to assign a design-time value to CurrentFile, usually to the oldest file in the collection.

Drag a Script Task onto the Control Flow and set as your ReadOnlyVariable User::RootFolder (optionally User::FileMask). Your ReadWriteVariable would be User::CurrentFile.
Edit Script

This script would go inside the public partial class ScriptMain: ... braces

    /// <summary>
    /// This verbose script identifies the most recently modified file of type fileMask
    /// living in RootFolder and assigns that to a DTS level variable.
    /// </summary>
    public void Main()
    {
        string fileMask = "*.csv";
        string mostRecentFile = string.Empty;
        string rootFolder = string.Empty;

        // Assign values from the DTS variables collection.
        // This is case sensitive. User:: is not required
        // but you must convert it from the Object type to a strong type
        rootFolder = Dts.Variables["User::RootFolder"].Value.ToString();

        // Repeat the above pattern to assign a value to fileMask if you wish
        // to make it a more flexible approach

        // Determine the most recent file, this could be null
        System.IO.FileInfo candidate = ScriptMain.GetLatestFile(rootFolder, fileMask);

        if (candidate != null)
        {
            mostRecentFile = candidate.FullName;
        }

        // Push the results back onto the variable
        Dts.Variables["CurrentFile"].Value = mostRecentFile;

        Dts.TaskResult = (int)ScriptResults.Success;
    }

    /// <summary>
    /// Find the most recent file matching a pattern
    /// </summary>
    /// <param name="directoryName">Folder to begin searching in</param>
    /// <param name="fileExtension">Extension to search, e.g. *.csv</param>
    /// <returns></returns>
    private static System.IO.FileInfo GetLatestFile(string directoryName, string fileExtension)
    {
        System.IO.DirectoryInfo directoryInfo = new System.IO.DirectoryInfo(directoryName);

        System.IO.FileInfo mostRecent = null;

        // Change the SearchOption to AllDirectories if you need to search subfolders
        System.IO.FileInfo[] legacyArray = directoryInfo.GetFiles(fileExtension, System.IO.SearchOption.TopDirectoryOnly);
        foreach (System.IO.FileInfo current in legacyArray)
        {
            if (mostRecent == null)
            {
                mostRecent = current;
            }

            if (current.LastWriteTimeUtc >= mostRecent.LastWriteTimeUtc)
            {
                mostRecent = current;
            }
        }

        return mostRecent;

        // To make the below code work, you'd need to edit the properties of the project
        // change the TargetFramework to probably 3.5 or 4. Not sure
        // Current error is the OrderByDescending doesn't exist for 2.0 framework
        //return directoryInfo.GetFiles(fileExtension)
        //     .OrderByDescending(q => q.LastWriteTimeUtc)
        //     .FirstOrDefault();
    }

    #region ScriptResults declaration
    /// <summary>
    /// This enum provides a convenient shorthand within the scope of this class for setting the
    /// result of the script.
    /// 
    /// This code was generated automatically.
    /// </summary>
    enum ScriptResults
    {
        Success = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Success,
        Failure = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Failure
    };
    #endregion

}

Updating a Connection Manager

At this point, our script has assigned a value to the CurrentFile variable. The next step is to tell SSIS we need to use that file. In your Connection Manager for your CSV, you will need to set an Expression (F4 or right click and select Properties) for the ConnectionString. The value you want to assign is our CurrentFile variable and the way that’s expressed is @[User::CurrentFile]

Assign connection string

Finally, these screen shots are based on the upcoming release of SQL Server 2012 so the icons may appear different but the functionality remains the same.

Leave a Comment