I started by parsing the HTML content and extracting the table data. Then, I used Export-Csv for the CSV part and the EPPlus library for creating an Excel file with formatting. Here's a snippet of how I did it:
# Sample HTML content
$htmlContent = @"
<table>
<tr>
<td style="background-color:#ff5733;">A1</td>
<td>B1</td>
<td>C1</td>
</tr>
<tr>
<td>A2</td>
<td style="background-color:#33ff57;">B2</td>
<td>C2</td>
</tr>
</table>
"@
# Load HTML and parse it
[xml]$html = $htmlContent
$rows = $html.table.tr
# Extract data and style
$data = @()
foreach ($row in $rows) {
$cols = $row.td
$data += [pscustomobject]@{
Column1 = $cols[0].'#text'
Column2 = $cols[1].'#text'
Column3 = $cols[2].'#text'
Style1 = $cols[0].style
Style2 = $cols[1].style
Style3 = $cols[2].style
}
}
# Export to CSV
$data | Select-Object Column1, Column2, Column3 | Export-Csv -Path "output.csv" -NoTypeInformation
# For Excel with formatting, using EPPlus library
Import-Module -Name ImportExcel
$excelPath = "output.xlsx"
$excel = New-ExcelPackage -Path $excelPath
$worksheet = $excel.Workbook.Worksheets.Add("Sheet1")
# Write data to Excel
$rowIndex = 1
foreach ($row in $data) {
$worksheet.Cells["A$rowIndex"].Value = $row.Column1
$worksheet.Cells["B$rowIndex"].Value = $row.Column2
$worksheet.Cells["C$rowIndex"].Value = $row.Column3
if ($row.Style1 -match "background-color:(#[A-Fa-f0-9]{6})") {
$worksheet.Cells["A$rowIndex"].Style.Fill.PatternType = 'Solid'
$worksheet.Cells["A$rowIndex"].Style.Fill.BackgroundColor.SetColor([System.Drawing.Color]::FromName($matches[1]))
}
if ($row.Style2 -match "background-color:(#[A-Fa-f0-9]{6})") {
$worksheet.Cells["B$rowIndex"].Style.Fill.PatternType = 'Solid'
$worksheet.Cells["B$rowIndex"].Style.Fill.BackgroundColor.SetColor([System.Drawing.Color]::FromName($matches[1]))
}
if ($row.Style3 -match "background-color:(#[A-Fa-f0-9]{6})") {
$worksheet.Cells["C$rowIndex"].Style.Fill.PatternType = 'Solid'
$worksheet.Cells["C$rowIndex"].Style.Fill.BackgroundColor.SetColor([System.Drawing.Color]::FromName($matches[1]))
}
$rowIndex++
}
$excel.Save()
This approach allowed me to keep the cell formatting from the HTML in the Excel output. If you only need a plain CSV without formatting, the Export-Csv cmdlet is straightforward and does the job. contact