BUG-002: Spark Date Type Cannot Convert to DateTime
Summary
When reading CSV files with date columns, Spark returns a native Date type that cannot be converted to .NET DateTime. The ObjectMaterializer fails because Spark's Date type doesn't implement IConvertible.
Error Message
Cannot convert value '2024-01-01' (type: Date) to DateTime
Affected Components
DataFlow.Framework.ObjectMaterializer (OSS)
DataFlow.Spark v1.2.0
- Not affected: Snowflake (uses strings for dates)
Root Cause
Location: MemberMaterializationPlan.cs:463 in DataFlow.Framework.ObjectMaterializer
When Spark reads CSV files, date columns are inferred as Spark's native Date type. This Java-based type is wrapped by Microsoft.Spark but doesn't implement .NET's IConvertible interface, causing Convert.ChangeType() to fail.
// MemberMaterializationPlan.cs - simplified
var value = row[columnIndex]; // Returns Spark Date object
var converted = Convert.ChangeType(value, typeof(DateTime)); // FAILS!
Reproduction Steps
Step 1: Create CSV with date column
id,order_date,amount
1,2024-01-15,500.00
2,2024-01-16,750.00
Step 2: Define model with DateTime property
public class Order
{
public int Id { get; set; }
public DateTime OrderDate { get; set; } // DateTime property
public double Amount { get; set; }
}
Step 3: Read and materialize
var context = Spark.Connect();
var orders = context.Read.Csv<Order>("path/to/orders.csv");
// This FAILS:
var results = orders.ToList(); // Throws: Cannot convert Date to DateTime
Failing Tests
| Project |
Test |
| (None currently) |
Tests avoid date columns as workaround |
Current Workarounds
Workaround 1: Use Parquet format
// Parquet preserves .NET types correctly
var orders = context.Read.Parquet<Order>("path/to/orders.parquet");
var results = orders.ToList(); // Works!
Workaround 2: Store dates as strings
public class Order
{
public int Id { get; set; }
public string OrderDate { get; set; } // String instead of DateTime
public double Amount { get; set; }
}
// Parse after materialization
var results = orders.ToList();
var parsedDates = results.Select(o => DateTime.Parse(o.OrderDate));
Workaround 3: Avoid CSV date columns
Remove date columns from test data and models entirely.
Proposed Fix
Add special handling for Spark Date type in the materializer:
// In MemberMaterializationPlan.cs
if (value is Microsoft.Spark.Sql.Types.Date sparkDate)
{
// Extract year, month, day and construct DateTime
return new DateTime(sparkDate.Year, sparkDate.Month, sparkDate.Day);
}
Or use Spark's cast() function to convert to string before pulling to .NET.
Impact
- Severity: HIGH (for CSV users)
- Frequency: Medium (Parquet users unaffected)
- User Impact: Forces Parquet or string-based date handling
Labels
bug, spark, materialization, csv, datetime
BUG-002: Spark Date Type Cannot Convert to DateTime
Summary
When reading CSV files with date columns, Spark returns a native
Datetype that cannot be converted to .NETDateTime. TheObjectMaterializerfails because Spark'sDatetype doesn't implementIConvertible.Error Message
Affected Components
DataFlow.Framework.ObjectMaterializer(OSS)DataFlow.Sparkv1.2.0Root Cause
Location:
MemberMaterializationPlan.cs:463inDataFlow.Framework.ObjectMaterializerWhen Spark reads CSV files, date columns are inferred as Spark's native
Datetype. This Java-based type is wrapped by Microsoft.Spark but doesn't implement .NET'sIConvertibleinterface, causingConvert.ChangeType()to fail.Reproduction Steps
Step 1: Create CSV with date column
Step 2: Define model with DateTime property
Step 3: Read and materialize
Failing Tests
Current Workarounds
Workaround 1: Use Parquet format
Workaround 2: Store dates as strings
Workaround 3: Avoid CSV date columns
Remove date columns from test data and models entirely.
Proposed Fix
Add special handling for Spark
Datetype in the materializer:Or use Spark's
cast()function to convert to string before pulling to .NET.Impact
Labels
bug,spark,materialization,csv,datetime