Feed burner changed their blog service return results that it returns blocks of javascript similar to:
document.write("\x3cdiv class\x3d\x22feedburnerFeedBlock\x22 id\x3d\x22RitterInsuranceMarketingRSSv3iugf6igask14fl8ok645b6l0\x22\x3e"); document.write("\x3cul\x3e"); document.write("\x3cli\x3e\x3cspan class\x3d\x22headline\x22\x3e\x3ca href\x3d\x22
I want the raw html out of this. Previously I was able to easily just use .Replace to cleave out the document.write syntax but I can't figure out what kind of encoding this is or atleast how to decode it with C#.
Edit: Well this was a semi-nightmare to finally solve, here's what I came up with incase anyone has any improvements to offer
public static char ConvertHexToASCII(this string hex){ if (hex == null) throw new ArgumentNullException(hex); return (char)Convert.ToByte(hex, 16);}
.
private string DecodeFeedburnerHtml(string html){ var builder = new StringBuilder(html.Length); var stack = new Stack<char>(4); foreach (var chr in html) { switch (chr) { case '\\': if (stack.Count == 0) { stack.Push(chr); } else { stack.Clear(); builder.Append(chr); } break; case 'x': if (stack.Count == 1) { stack.Push(chr); } else { stack.Clear(); builder.Append(chr); } break; default: if (stack.Count >= 2) { stack.Push(chr); if (stack.Count == 4) { //get stack[3]stack[4] string hexString = string.Format("{1}{0}", stack.Pop(), stack.Pop()); builder.Append(hexString.ConvertHexToASCII()); stack.Clear(); } } else { builder.Append(chr); } break; } } html = builder.ToString(); return html;}
Not sure what else I could do better. For some reason code like this always feels really dirty to me even though it's a linear time algorithm I guess this is related to how long it has to be.