T-SQL: NVARCHAR(255) vs. NVARCHAR(MAX) Disk Space Comparison


In some scenarios like ETL or load data into a DB, we need to insert the data into a staging table, where all the columns are String data type and on the next step to cleanse and manipulate the data. I heard a discussion about the data type of the columns in the staging table – NVARCHAR(255) or NVARCHAR(MAX). Some of the arguments were:

  • Statement 1NVARCHAR(MAX) will return error message if we have more than 20 columns
  • Statement 2 NVARCHAR(MAX) takes much more disk space

I decided to have a little fun and reproduce the scenario when we insert random data in both tables – NVARCHAR(255) and NVARCHAR(MAX)

The steps are:

  • Create databases – DB_255 and DB_MAX
  • Create tables with 100 columns in each DB – Table_255 and Table_MAX
  • Populate the tables with 1000 random rows
  • Duplicate the rows until we have 32,000 rows in each table
  • Check the size of the .mdf for each DB

Create databases

Create test objects

Check test objects

Populate test tables

Check test tables columns and rows count

Check test tables columns and rows count

Check file size

Check file size

Delete databases

Conclusion:

  • Both statements failed
  • There is a hint in the random function – we can’t use RAND() inside a function. The runaround is to create a view that selects RAND() and use the view inside the function
  • You can use the function string as inspiration to create your own one

Keep it simple :-)

Share

Leave a comment

Your email address will not be published. Required fields are marked *