2015-04-02
It has been suggested that consistently formatted code has a lower defect density (bugs per lines of code). Though SSIS is a visual designer, I hardly think this excludes it from consideration. I have no empirical evidence, but anecdotally, I can say I have typically found more problems with messy packages than well groomed ones.
So why is it so few developers bother to format their packages? Well, for one, SSIS does not auto format code the same way that modern IDEs do, or Visio for that matter. It's extra work to do the formatting, even with the format feature in the tool (which often has unexpected results). Devs must not think it matters too much or perhaps they just have a really bad eye for neatness.
So here are a few rules to live by while developing in SSIS:
- Format every control and dataflow. Come on, people! Format > Auto Layout > Diagram
- Only use autosize on containers (For and Foreach loops included). Inconsistently formatted components break up the flow and place unnecessary emphasis on larger components. As if more words made something more important.
- For component sizing, use Make Same Size. It's an unfortunate feature that was added to 2012, but the Auto Layout now does autosize, as well, and this should be corrected.
- Always flow down. Sources on top and destinations on the bottom. Think of the added performance you'll get with the help of gravity. Don't get funny going sideways and, god help you, don't go up. If your package looks like a wiring schematic for a toaster, delete and start over.
- Put components in the same place each time. Decide with your team, but if inserts are on the left, always make them on the left. If updates go right (even to a staging destination), make them always go right. If things are always where you expect them to be, this makes maintenance a much easier job.
- Don't use annotations. They add clutter and they never land in the right place after your auto format, so it makes it an extra chore every time a package is refactored. The naming convention in the package should explain what the package is doing. And if you are doing something that cryptic, find a simpler solution.
- One path per dataflow. If you have multiple sets of sources and targets, put them in multiple dataflows. It is what is expected and there is no added benefit to jamming it all into one data flow.
- Make datafows and control flows short and sweet. One task or set of tasks per package. It is much easier to read and troubleshoot a short package, then a long one. So it follows that it is always easier to stage the data (even multiple times) from source to destination, then to create one monster long package that no one will ever want to touch. I'll follow on this in another post about separation of concerns.
- Keep T-SQL to no more than one or two lines. For example, execute SQL tasks should have Truncate table or Drop Index, but not long merge or update statements. These should be converted to procs. One reason is because the editor stinks and does not support intellisense and syntax highlighting. It is complete misery to troubleshoot any lengthy bit of code.. Complex SQL tasks should be coded in more robust tools like SSDT.
Will these simple rules reduce defects in your code? Maybe. It will at least demonstrate pride in your work and show that you care what the next person is walking into when they inherit it. And you will thank yourself when this becomes an obsessive habit.
Mark Wojciechowicz
Labels: SSIS