21/06/2008 10:18:42 PM
Dean Sutcliffe is going to take as to the cleaners, covering...
• Difference between Data Cleansing & Data Profiling.
– When to do which.
– Why to use tools instead of ad-hoc code
– What to look for when profiling
– Choosing what to address first and why
• Data Cleansing Processes; ( Closed Feedback Loop vs continuous).
– Eg: Do you continually cleanse within ETL process OR do you use errors to go back & fix the errors in the source.
• Different Cleaning Approaches
– Cleaning via algorithms; Parsing, Patterns, ie: phone nos have 8 digits etc.
– Cleaning via Reference Data; ABN, Valid street numbers within specific suburbs
– Cleaning via probability; Data Mining, fuzzy lookups. Ie Building reference data from your other data.
– Survivorship of data, More than blending duplicate records to fill in NULLs
• Different assistance from within SQL Server
– Integration Services: Fuzzy Lookup, Fuzzy Grouping , Lookup, etc
– SQL 08 SSIS Data Profiler
– SQL 05 SSAS BIDS (ability to understand your data when designing a cube)
– Other approaches to use within SQL Server
Venue: Mechanics' School of Arts, Level 1, 280 Pitt St, Sydney |
Duration 5:30pm Share Ideas, Network with your Peers (with Pizza & Drinks) Session ends 8pm |
On: Tuesday, 12 August 2008