Geoff's Business Intelligence Blog: 2013

Friday, 7 June 2013

Queue up BackgroundWorkers OR wait for BackgroundWorker to finish without blocking the UI

This is a new one, so thought I'd put it here.
I have a procedure that uses a BackgroundWorker and updates a log window in my main form. Simple stuff.
I use this procedure for more than one task, incorporating more than one procedure. In the case I solved here, I had to wait for one BackgroundWorker to complete before moving on with the code in the Main Thread. I couldn't trigger it in the RunWorkerComplete event, as the BackgroundWorker was used multiple times throughout the application.
So here it is: (I used ThreadPool and Invoked the actions!)

private void Button1_Click(object sender, EventArgs e)
        {
            ThreadPool.QueueUserWorkItem(delegate
            {
                Invoke(new Action(delegate { ProcedureWithBackgroundWorker(); }));

                while (BW.IsBusy)
                {
                    Thread.Sleep(100);
                }

                Invoke(new Action(delegate { SecondProcedureThatWaitsForTheOneAboveToFinish(); }));
            });
        }

Friday, 24 May 2013

Pivoting data in SQL and SSIS Dynamically

There is a pviot function in SSIS and a pivot query in SQL - both of which I find laborious, ESPECIALLY the one in SSIS!
Here is a dynamic SQL query that will fix the problem:

DECLARE @columns NVARCHAR(2000)
SELECT  @columns = COALESCE(@columns + ',[' + [Column2] + ']', '[' + [Column2] + ']')
FROM    TableName
ORDER BY [Column2]

--print @columns
DECLARE @SqlQuery nvarchar(4000)
set @SqlQuery = 'select [Column1],' + @cols + ' into dbo.NewTable from (select [Column1],[Column2], [Column3] from TableName) p 
   PIVOT (MAX([Column3]) FOR [Column2] in (' + @columns + ') ) AS pvt order by [Column1]'

--print @SqlQuery
execute (@SqlQuery)

I recommend adding this as an 'Execute SQL Task' in your Control Flow - even if it means using a Staging Table. Will Allow for a change in columns and everything!

One more thing! Notice the 'into dbo.NewTable'. You'll need to drop that table before performing this task. This will allow for a change to the dynamic columns. Dynamic column queries such as this are great even if the columns are fixed.
I've just done one where there were 50 columns after the pivot and I don't have time to be defining and writing all that extra code! It is espcially irritating in SQL (before SQL Server 2012 - which I understand takes care of tasks such as this much better!).
Good luck.

Tuesday, 7 May 2013

Create and export Excel in C# Winform application

So I used to do this with something along the lines of:

using System;
using System.Windows.Forms;
using Excel = Microsoft.Office.Interop.Excel;

namespace WindowsApplication1
{
 public partial class Form1 : Form
 { 
    public Form1()
  {
   InitializeComponent();
  }
 private void button1_Click(object sender, EventArgs e)
 {
  Excel.Application excel = new Excel.Application();
  Excel.Workbook wb = new Excel.Workbook();
  excel.Workbook.Add(wb);

etc...etc...etc..

Then I would loop through, adding data to a worksheet. This was very memory intensive and very SLOW....
This is how I do it now.

I create an Excel file using Excel - I add and name worksheets, I format columns etc and get it to 'look' how I want it to.

I then add this to the application in a folder called 'Excel Templates'. You can then change the deployment properties to export and create this file on installation.

The idea is that the application will copy, rename and save this file. It then creates an OLEDB connection and adds rows using an update command (I use update because it keeps the cell formatting and doesn't append after 'blank rows').

This is a basic class for copying and populating the Excel template. The 'FileName' parameter is what and where you want to save the file as.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Data;
using System.Data.SqlClient;
using System.IO;

namespace SpaceApp_v0._1.Classes
{
    class ExcelClassExample
    {
        static DataSet ds_Excel;

        public void CreateViewer(string FileName)
        {
            string myPath = System.Windows.Forms.Application.StartupPath + @"\Excel Templates\ExcelTemplate.xls";

            File.Copy(myPath, FileName, true);

            Populate_Excel(FileName);
        }

        private void Populate_Excel(string FileName)
        {
            SqlConnection conn = new SqlConnection(ConnectionsClass.Server);

            string sqlQuery = "select FieldNames from TableName Where parameters = 'Value'";

            SqlCommand cmd = new SqlCommand(sqlQuery, conn);

            ds_Excel = new DataSet();

            SqlDataAdapter da = new SqlDataAdapter(cmd);

            conn.Open();

            da.Fill(ds_Excel, "TableName");

            conn.Close();

            string DSN = string.Format("Provider=Microsoft.Jet.OLEDB.4.0;Data Source={0};Extended Properties=\"Excel 8.0;HDR=No;IMEX=2\"", FileName);
            using (System.Data.OleDb.OleDbConnection Con = new System.Data.OleDb.OleDbConnection(DSN))
            {

                Con.Open();

                using (System.Data.OleDb.OleDbCommand Com = new System.Data.OleDb.OleDbCommand())
                {
                    int Row = 1;
                    foreach (DataRow row in ds_Excel.Tables["TableName"].Rows)
                    {
                        Com.Connection = Con;
                        Com.CommandText = string.Format("UPDATE [Sheet1$A" + Row + ":Z" + Row + "] SET F1 = '{0}'", row["Column1"]);

                        Com.ExecuteNonQuery();
                        Row++;
                    }
                }
                Con.Close();
            }
        }

    }
}

You can call this class from your main app using a button click event. I have used this is a background worker, creating 2,500 Excel 'templates' in a short period of time, with progress bar etc.

Tuesday, 26 February 2013

Connect to SSAS outside of a domain!!

I was finally given to a test server to try out my new-fangled Data Cube expertise. Trouble is the company is quite paranoid about security and things going wrong, so it isn't part of the domain. This means I cannot connect to Analysis Services as AS will only allow Windows Authentication.

Never fear - where there's a will there's a way!

I created an application pool using IIS, and connect to that instead!

Here's how to do it!

Go to: C:\Program Files\Microsoft SQL Server\MSSQL.1\OLAP\bin\isapi\ and copy the contents of this folder to: C:\Inetpub\wwwwroot\olap\. You'll need to create the 'olap' folder.

Right-click on 'My Computer' and click on 'Manage'. Click on 'Services and Applications' then 'Internet Information Services' (if you don't have IIS installed you can probably do this from 'Control Panel' - I'm not going to explain it here, just GOOGLE IT!!).

Right-click on 'Application Pools' then 'New' then 'Application Pool...' Set 'Application Pool ID' to 'olap' (can you see where this is going yet?). Ensure 'Use default settings for new application pool' is checked.

Now, open 'Web Sites' folder under 'Application Pools' folder, right-click on 'Default Web Site' and click 'new' then 'Virtual Directory'. Name it as 'OLAP' and point it to the folder you created earlier 'C:\Inetpub\wwwwroot\olap'. In 'Access Permissions' select 'Run scripts (such as ASP)' ONLY.

Right-click on your new 'OLAP' website and click on 'Properties'
Set 'Application pool:' to 'olap'

Click on 'Configuration', then 'Add...'

In the 'Executable' box browse to your virtual directory.

Type '.dll' into the 'Extension' box.

Click 'OK'

In OLAP Properties, click on the 'Directory Security' tab and under 'Authentication and access control' click on 'Edit'.
Enable annonymous access

Enable 'Integrated Windows Authentication' and 'Basic Authentication (password is sent in clear text)'

FINAL SECTION!

Back in 'Computer Management' open 'Services and Applications' then 'Internet Information Services' then right-click on 'Web Service Extension' and 'Add a new Web Service Extension'.

Set 'Extension Name' to 'olap' and add the 'msmdpump.dll' to 'Required files' by browsing to your virtual directory'. Check 'Set extension status to Allowed' and click 'OK'.

That oughta' do it! Now when you connect to SSAS via Management Studio type in 'http://servername/olap/msmdpump.dll' and BINGO! you're iin.
I may add pictures but I'm busy today, besides it's better to struggle and understand than follow blindly and not understand.

Thursday, 7 February 2013

Embed Console Window in Winform Application

This has been bugging me for a while now.

I use TelNet to pass commands to a Unix box. I wanted to embed the TelNet interface into a Winform application.

I concentrated on embedding a console app within the winform app. WRONG.

I used TelNet libraries and messed around looking to use RichTextBoxes. WRONG.
Well. Probably not that wrong, they mostly worked but weren't very good.

Here is what I have done:

Process TelNet = new Process();

TelNet.StartInfo.FileName = "TelNet.exe";

TelNet.Start();

If you need any more clarification please leave.

This prevents cross-thrreading issues, launching new applications and all sorts of problems.

It works for me. I know it's simple. It's all I require.

Wednesday, 9 January 2013

Simplified Modulo in SSIS

OK, last modulo post was rather meandering and overly complex for most people to use. I wrote it mostly for my reference in the future to jog my memory and didn't have the time to go into great detail on how to implement that solution.

Next time I design a large Slowly Changing Dimension I may post it up step by step in greater detail.

With that in mind I thought I'd post a simpler to understand and implement example.

The Algorithm works as follows:

instead of this -

select * from MyLargeTable

Use this query -

select * from MyLargeTable
where (ID %3) = 0
OPTION (MAXDOP 1)

The '3' after the percentage sign dictates how many sections to split the table into and the '= 0' dictates which selection we are selecting. It is a zero-based index.

A simple implementation in SSIS looks like this:

Each OLE DB Source contains the same SQL query as before, but each one uses a different part of the zero-based index (the clue is in the name of the OLE DB Source!).

So there we are - a really simple solution I quarantee will GREATLY improve the speed of large SQL imports, and will also improve the performance of a large dimension when implemented in a bit more context than I had time to write.

N.B. The 'OPTION (MAXDOP 1)' part does something with the processes I can't remember, but I think it's related to running asynchronously and improves performance. Look it up.

Thursday, 3 January 2013

Large Tables/Dimensions and the Modulo Algorithm in SSIS

While working in a limited memory environment I needed to be frugal with the amount of RAM a package would take up. To this end, I had a Slowly Changing Dimension that contained about a million records, which when running on a 32 machine would crash and run out of memory part-way through.

This also applied to large transformations, where although I didn't run out of memory, going to capacity greatly slowed the process down.

What I found to ease these troubles is the Modulo Algorithm. By using this to split the data, or in the case of the dimension to process smaller chunks, is that performance and in fact the ability to complete the processing at all was greatly improved!

Here's an example of how to implement it in a simple enough Slowly Changing Dimension:

Let's start with the data. A large dimension of changing size (new records can be added at anytime - especially with a customer, client or address dimension). Limited RAM (2GB per transaction) with an optimal and easily controlled 10,000 records processed with each iteration.

I will use a For Loop Container to loop through each iteration. To work out the number of iterations I will divide the number of records in the dimension by 10,000 and round this UP to the nearest integer.

We want to create a table in our database that will hold each iteration of records - I make this table on the fly each time, so I can use it for multiple packages and tables.

Drag an Execute SQL Task onto the DataFlow and set the ResultSet to 'None'. Set the SQL to the following:


If (EXISTS (SELECT * FROM INFORMATION_SCHEMA.TABLES
   WHERE TABLE_SCHEMA = 'dbo'
   AND TABLE_NAME = 'Modulo_Temp'))
BEGIN

DROP TABLE dbo.Modulo_Temp

END

ELSE

BEGIN

CREATE TABLE dbo.Modulo_Temp
(Field1 int,Field2 varchar(100))

END

The CREATE TABLE part of the query needs to match the structure of the relational table used to match to the existing dimension.

Create two variables called 'Iterations' and 'CurrentIteration' as below:

Drop a script task in to your Control Flow, add in the below SQL Statement, set the ResultSet to SingleRow and map the ResultSet to variable 'Iterations'.

select (count(ID) / 10000) + 1
from dbo.TableName

Next drop a For Loop Container into the Control Flow and set it up as below:

Here's a close up of the configuration:

Not the best picture quality so:
InitExpression : @CurrentIteration

EvalExpression : @CurrentIteration <= @Iterations

AssignExpression : @CurrentIteration = @CurrentIteration + 1

Next add an Execute SQL Task into the Loop and set the SQL to:


TRUNCATE TABLE dbo.Modulo_Temp

Now here's the fun part! We will now use the Modulo Algorithm to split the source table into chunks of 10k records. We will treat each 10k chunk as it's own dimension.

You can use the Slowly Changing Dimension task provided with Integration Services, but I prefer the Kimball method one available on CodePlex. It allows a lot more customisation, is faster and works well with this task as you can set your own dimension source using SQL, rather than just pointing to a table in the database. This facet is key as we need to restrict the dimension records to the ones contained within our 10k chunks of each iteration.

Drag a DataFlow component into the ForLoop.

In the DataFlow add an OLE DB Source. Add a variable called 'SourceSQL' with a data type of string. In the properties set 'Evaluate as Expression' to TRUE. In the expression write your select statement for your source table, but add in the Modulo part as below:


"SELECT
Field1,
Field2...
FROM dbo.SourceTable
WHERE (pID % " + (DT_WSTR,10)@[User::Iterations]  + ") =  " + (DT_WSTR,10)@[User::CurrentIteration]  + "
ORDER BY ID
OPTION(MAXDOP 1)"

Next we need to set our Existing Dimension source. Simply modify the code the below to meet your requirements.
/


select
Field1,
Field2...
from dbo.ExistingDimension
where YourBusinessKey in (select ID from dbo.Modulo_Temp)

This code will restict the records in the dimension to only those 10k you pulled through from the source code. Attach your Slowly Changing Dimension task to these two inputs as you normally would and BINGO!!

N.B. Do not use the Delete/Missing outputs from the Slowly Changing Dimension tasks as this will not work. I'll leave you to work out why!
/

Round up of Modulo:
Add (IDField % 5) to split table into 5 chunks. On to this add what part of the table you want like so: (IDField % 5) = 0. It uses zero based index for the chunks.

I have also seen this technique used to greatly improve the speed of large imports from SQL. Create 3 OLE DB Sources in one DataFlow, each one taking a third of the table to be imported and union them together. Much faster than one source.

I know this post is quite brief, and skips a few things, but it meant for basic guidance rather than instructions. Always experiement and play around.

Wednesday, 2 January 2013

Hanging SSIS Packages

Been having some problems with Integration packages hanging, so thought I'd share some common bloopers to avoid in your development.

NEVER use the same table as OLE DB Source AND Destination in the same DataFlow. When the package tries to write out to the table, it will be locked by the package trying to extract records. Seems simple enough, but I have fallen for it in the past!

NEVER do the same as above using Raw Files. Similar reasons. In fact don't do it for any connection/desintation.

I never recieved a single error message, just hanging packages and a lot of frustration once I realised what I was doing.

Integration Services will also be affected by general locks on the SQL Server, so be careful how you manage access to the tables and locking in general. For example, don't try and add to a Fact Table while a cube is processing - sounds simple unless you have a lot of schedules and haven't kept track of your timings (Only happened to me in testing BTW!).
Things to look out for:

Memory usage - large sorts can use up all available memory and stall a package.

CPU usage - can cause bottle-necks in large packages.

Bottle-necks in general - can slow a package down so much it looks 'hung'; check engine threads, Default BufferMaxRow and DefaultBufferSize for large DataFlows. Consider using BufferTempStoragePath, which can help free-up memory. I have used this quite successfully while processing large dimensions. Also consider splitting the DataFlow into chunks.

I'll do a post on this in the next couple of days:- using the Modulo Algorithm to split a dimension, so when the Slowly Changing Dimension is running, we are only using a section of the dimension at a time.

Currently I am working on a mystery hanging package.

Task 1 (DataFlow): Imports 10 million records to a table.

Task 2 (Execute Package Task): Runs a small package containing a Slowly Changing Dimension.

Task 3 (Raw File Sourced DataFlow): Transforms some data.

Always hangs on Task 3.

Moved Task 2 to Task 6; improves performance by changing the task that hangs to task 15.

Still working on this one...