Wednesday, 5 December 2012

Creating SSIS package from C# application

Unusual one this.  Had a need to truncate about 70 Raw Files after a large ETL process.  There is no specific task in SSIS, but you can set a DataFlow up that passes no rows to the destination and this will sort the problem out. 

Rather than making 70+ DataFlows I initially used a loop, go through each raw file and truncate it - but this changed the meta-data each time (as RawFiles don't use connections) and caused a lot of validation errors.  I could have delayed validation on the tasks, but this would have taken a very long time to go through 10+ packages looking for the use of the raw files.

Instead I chose to create a class in C# that would loop through all the RawFiles in a folder and create a bespoke DataFlow for them within a new package.

The code creates a new package, then uses a class to return a new DataFlow task in a loop.

The DataFlow consists of a RawFile Source followed by a Conditional Split and a RawFile Destination.
The conditional split creates an output with the condition "1 == 0"; as 1 will never equal 0 it will never pass a row so will truncate the RawFile without affecting the meta-data.

This is my class:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using Microsoft.SqlServer.Dts.Runtime;
using Microsoft.SqlServer.Dts.Pipeline.Wrapper;
using wrap = Microsoft.SqlServer.Dts.Pipeline.Wrapper;
using System.Xml;
using System.Data.Common;
using System.Configuration;
using System.Collections;
using System.IO;

namespace SpaceApp_v0._1.Classes
{
    public class RawFileCleanUp
    {

        public TaskHost SSISPackage(Package package, Executable dataFlowTask, string DataFlowTaskName, string RawFileLocation)
        {
            //// Add DataFlow
            TaskHost taskHost = dataFlowTask as TaskHost;
            taskHost.Name = DataFlowTaskName;

            MainPipe pipeline = taskHost.InnerObject as MainPipe;

            //// Add RawFile Source
            IDTSComponentMetaData90 rawSource = pipeline.ComponentMetaDataCollection.New();
            rawSource.ComponentClassID = "DTSAdapter.RawSource.1";
            rawSource.ValidateExternalMetadata = true;
            IDTSDesigntimeComponent90 rawSourceDesignTimeComponent = rawSource.Instantiate();
            rawSourceDesignTimeComponent.ProvideComponentProperties();
            rawSource.Name = "RawFile Source";

            rawSourceDesignTimeComponent.SetComponentProperty("AccessMode", 0);
            rawSourceDesignTimeComponent.SetComponentProperty("FileName", RawFileLocation);

            rawSourceDesignTimeComponent.AcquireConnections(null);
            rawSourceDesignTimeComponent.ReinitializeMetaData();
            rawSourceDesignTimeComponent.ReleaseConnections();
            

            //// Add Conditional Split to drop ALL rows
            IDTSComponentMetaData90 conSplit = pipeline.ComponentMetaDataCollection.New();
            conSplit.ComponentClassID = "DTSTransform.ConditionalSplit";
            conSplit.Name = "Conditional Split";
            conSplit.Description = "Conditional Split Transform";

            CManagedComponentWrapper splitWrapper = conSplit.Instantiate();
            splitWrapper.ProvideComponentProperties();
            splitWrapper.AcquireConnections(null);
            splitWrapper.ReinitializeMetaData();
            splitWrapper.ReleaseConnections();
            

            //// Connect RawFile Source to Split
            IDTSPath90 path = pipeline.PathCollection.New();
            path.AttachPathAndPropagateNotifications(rawSource.OutputCollection[0], conSplit.InputCollection[0]);

            //// Configure split
            IDTSOutput90 splitOutput = splitWrapper.InsertOutput(DTSInsertPlacement.IP_BEFORE, conSplit.OutputCollection[0].ID);
            splitOutput.Name = "Split Output 1";
            splitOutput.Description = "Passes zero rows out";
            splitOutput.IsErrorOut = false;

            IDTSInput90 splitInput = conSplit.InputCollection[0];
            IDTSInputColumnCollection90 splitInputColumns = splitInput.InputColumnCollection;
            IDTSVirtualInput90 splitVirtualInput = splitInput.GetVirtualInput();
            IDTSVirtualInputColumnCollection90 splitVirtualInputColumns = splitVirtualInput.VirtualInputColumnCollection;

            string ConColumn = "";
            int columnCount = splitVirtualInputColumns.Count;
            for (int i = 0; i < columnCount; i++)
            {
                splitWrapper.SetUsageType(splitInput.ID,splitVirtualInput,splitVirtualInputColumns[i].LineageID,DTSUsageType.UT_READONLY);
                if (i == 0)
                {
                    ConColumn = splitVirtualInputColumns[i].Name;
                }
            }

            splitWrapper.SetOutputProperty(splitOutput.ID, "EvaluationOrder",0);
            splitWrapper.SetOutputProperty(splitOutput.ID, "FriendlyExpression", "1 == 0");


            //// Add RawFile Destination
            IDTSComponentMetaData90 rawDestination = pipeline.ComponentMetaDataCollection.New();
            rawDestination.ComponentClassID = "DTSAdapter.RawDestination.1";
            rawDestination.ValidateExternalMetadata = true;
            IDTSDesigntimeComponent90 rawDestinationDeignTimeComponent = rawDestination.Instantiate();
            rawDestinationDeignTimeComponent.ProvideComponentProperties();
            rawDestination.Name = "RawFile Destination";

            rawDestinationDeignTimeComponent.SetComponentProperty("AccessMode", 0);
            rawDestinationDeignTimeComponent.SetComponentProperty("FileName", RawFileLocation);

            rawDestinationDeignTimeComponent.AcquireConnections(null);
            rawDestinationDeignTimeComponent.ReinitializeMetaData();
            rawDestinationDeignTimeComponent.ReleaseConnections();


            IDTSPath90 DestPath = pipeline.PathCollection.New();
            DestPath.AttachPathAndPropagateNotifications(conSplit.OutputCollection[0], rawDestination.InputCollection[0]);

            //// Connect Split to Destination
            IDTSInput90 rawInput = rawDestination.InputCollection[0];
            IDTSVirtualInput90 rawVInput = rawInput.GetVirtualInput();

            foreach (IDTSVirtualInputColumn90 vColumn in rawVInput.VirtualInputColumnCollection)
            {
                rawDestinationDeignTimeComponent.SetUsageType(rawInput.ID, rawVInput, vColumn.LineageID, DTSUsageType.UT_READONLY);
            }

            ////Repeat connections and validate to ensure package will run first time with no warnings and proper metadata.
            rawSourceDesignTimeComponent.AcquireConnections(null);
            rawSourceDesignTimeComponent.ReinitializeMetaData();
            rawSourceDesignTimeComponent.ReleaseConnections();

            splitWrapper.AcquireConnections(null);
            splitWrapper.ReinitializeMetaData();
            splitWrapper.ReleaseConnections();
            
            rawDestinationDeignTimeComponent.AcquireConnections(null);
            rawDestinationDeignTimeComponent.ReinitializeMetaData();
            rawDestinationDeignTimeComponent.ReleaseConnections();

            rawSourceDesignTimeComponent.Validate();
            splitWrapper.Validate();
            rawDestinationDeignTimeComponent.Validate();


            return taskHost;



        }

        


    }

}

You may notice that I call the AcquireConnections, ReinitialiseMetaData and ReleaseConnections more than once.  This seem to relieve an issue I had with validation warnings after the package was saved.  With the code as it is you can run the package the instant it has been created.

This is the code to call the class:

You pick the folder that holds the RawFiles and let it run.  I would recommend changing the NewFileName or setting it to a variable like a text box, or even the original FolderBrowserDialog plus a new file name.

string Folder = "";

            FolderBrowserDialog folderBrowser = new FolderBrowserDialog();
            folderBrowser.RootFolder = Environment.SpecialFolder.Desktop;

            DialogResult result = folderBrowser.ShowDialog();
            if (result == DialogResult.OK)
            {
                Folder = folderBrowser.SelectedPath;
            }

            string[] files = Directory.GetFiles(Folder);

            Microsoft.SqlServer.Dts.Runtime.Application app = new Microsoft.SqlServer.Dts.Runtime.Application();

            Package package = new Package();

            string RawFile = "";

            foreach (string file in files)
            {
                RawFile = file.Substring(Folder.Length, file.Length - Folder.Length);
                RawFile = RawFile.Replace("\\", "");

                RawFileCleanUp rawFileClean = new RawFileCleanUp();

                Executable dataFlowTask = package.Executables.Add("STOCK:PipelineTask");

                TaskHost taskHost = rawFileClean.SSISPackage(package, dataFlowTask, RawFile, file);


            }

            package.Validate(null, null, null, null);

            string NewFileName = Folder + "\\RawFileCleanUp.dtsx";

            XmlDocument myPkgDocument = new XmlDocument();
            package.SaveToXML(ref myPkgDocument, null, null);

            app.SaveToXml(NewFileName, package, null);

No comments:

Post a Comment