Hello readers!

Blogs are dying

Last week I was surprised when got the following message on Microsoft Blogs (eaxmple: https://blogs.technet.microsoft.com/crypto):

image

After some investigation, more disabled blogs were found. I tried to find any information about what is going on, but not much luck. All I was able to find is the fact that Microsoft is retiring their TechNet and MSDN platforms and move to..yes, another blogging engine. Though, not all blogs are moved. There are various rumors (not yet official) and they suggest that only most popular and trending (Azure!) blogs will be migrated. The rest blogs will be wiped. Silently. Other rumors suggest that it is blogs owner’s responsibility to move their blog to a new platform. Keep in mind, these are just rumors, the fact is that blogs silently disappear: https://blogs.technet.microsoft.com/brandonlinton/2018/11/05/retirement/. There was no official announcement from Microsoft about the trend or blog decommission schedule. Further investigation revealed that MSDN blogs are mosing to DevBlogs and TechNet blogs are moving to TechCommunity.

A bit of history

Microsoft blogs were launched somewhere in the mid of 2003 on a customized version of Telligent Community Server. By the time, Community Server was the only available multi-user blogging engine powered by .NET. As new blogs were added and their popularity grew it was clearly evident that Community Server cannot handle the grewing load of Microsoft blogs and platform functionality and scalability was limited.

In 2015, Microsoft decided to move all their blogs to a proven and scalable solution based on a Wordpress blog engine.

At the end of 2018 (at the time of posting, the process still continues) Microsoft started a new blogging platform. There was no announcement about another blog migration. However, some blogs were migrated. Based on my little research, DevBlogs and TechCommunity platforms are powered by Lithium blogging engine.

Microsoft blogs reached their prime in 2008-2013. After 2013, more and more blogs were abandoned by their owners and overal postings activity slowed significantly. As of March 2019, only few dozens of blogs are actively updated, the rest are no longer updated (owner could change his position, retire from MSFT or have any other reason).

A bit of reality

It is a fact that Microsoft blogs are extremely valuable and popular within IT Pro and software developers communities. Blogs contain literally a “shitload” of technical gems about Microsoft products, internals in deep details and other hidden knowledge you will never find anywhere else. A lot of interesting support case stories product teams faced were posted as well. Blogs were used to announce new products, features, their explanation before more formal information is delivered to TechNet and MSDN web sites.

Microsoft put a lot of efforts in blogs promotion, community cultivation, as the result blogs are loved by community. And now, Microsoft silently, without announcement, shutdown blogs and remove the content! I have no idea about the criteria used to schedule particular blog shutdown. Some rumors suggest that blogs with low traffic are discontinued firts, however all inactive blogs will be discontinued eventually. That’s pitty! Even if particular blog is no longer updated, its information is still relevant in most cases. IT-related websites are full of links to Microsoft blogs and their posts. Now, these links are slowly dying and soon most will show you 404. Without any explanation.

  • Microsoft did not announce the shutdown process.
  • Microsoft did not provide solutions to get dead blog offline copy.
  • The information is lost forewer!

There is a chance to recover some links by web archive, though not all blogs or posts were indexed by web archive. Windows 7 and Windows 8 development blogs in Russian weren’t backed up by web archive. And if you don’t have exact link, web archive doesn’t help much. When Windows Server 2003 TechNet content was retired, Microsoft released a compiled PDF version of retired content: Windows Server 2003/2003 R2 Retired Content. There is no such solution of for blogs. No PDF, no other offline copy, nothing. Microsoft literally spit in the face of the community they cultivated with blogs. I can’t find other words to express my feeling when I face 404 on one or another blog I’m trying to read.

PowerShell

I’m a huge fan of various old stuff and trying to collect everything I see interesting to me. If I would know about Microsoft blogs shutdown in advance, I would react accordingly, backup blogs while they were alive. Fortunately, not all blogs are wiped at this moment and I quickly wrote a PowerShell solution to download entire chosen blog with every post content. Further, I added image (if still active) download and URL rewrite within posts (which appear in <a> and <img> HTML tags).

The logic is quite simple:

  • visit blog’s main page and get the last page in the blog pagination;
  • loop over every page and get a list of posts on every pagination page
  • grab blog post content as is (no theming, branding or whatever else. Only post) and get some metadata: post author, post date, original URL
  • verify if post contains images. Attempt to download every containing image. If image is clickable (wrapped in <a> tag) download it as well
  • save blog post as HTML file and images in a folder.

The downloaded data structure is as follows:

  • for every post a folder is created. Post creation timestamp is used as a folder name

image

  • every folder contains single post. HTML file with post name as a file name and embedded images:

image

Example HTML

image

Unfortunately, PowerShell [xml] type accelerator doesn’t work with blogs HTML (because of unescaped JS scripts), as the result, I was forced to use 3rd party library that made the HTML parsing job very easy.

The script relies on a 3rd party library dependency called HtmlAgilityPack. You must download the dll file and put it in the same directory where PowerShell script is located.

Vadim Sterkin published sample blogs in styled HTML and PDF formats. Online repository of already downloaded blogs along with instructions is stored on Google Drive.

Blogs backup script

Script download button:

The package includes PowerShell script, downloaded HtmlAgilityPack and stylesheet file with custom theme.

And script example usage:

.\Backup-MsftBlog.ps1 –BlogUri https://blogs.technet.microsoft.com/pki/ –OutputDirectory .\blogs\pki

In the -BlogUri parameter you specify a full URL to blog’s main page and in the -OutputDirectory parameter specify the folder where the blog will be downloaded. The script implements a -Verbose switch to get the log of crawling process. Use this script if you wish to get your own copy of Microsoft blog you like which can be wiped at any moment.


Share this article:

Comments:

Joseph
Joseph 26.03.2019 16:15 (GMT+2) How to save disappearing MSDN and TechNet blogs with PowerShell

HI Vadims,

Thanks for sharing the Backup-MsftBlog.ps1. it is really helpful to take a backup of MSDN blogs.

i tried to take a backup of https://blogs.msdn.microsoft.com/kaushal blogs but it threw an error stating that "Invoke-WebRequest : This site uses cookies for analytics, personalized content and ads. By continuing to browse this site, you agree to this use. Learn more"

Please help.

Siggy
Siggy 28.03.2019 15:39 (GMT+2) How to save disappearing MSDN and TechNet blogs with PowerShell

Great idea archiving the PKI site. Hope Microsoft realizes the magnitude of this shutdown.

Randolph
Randolph 02.04.2019 00:45 (GMT+2) How to save disappearing MSDN and TechNet blogs with PowerShell

I am getting the following error, even after running as admin and re-setting the execution policy:
"Add-Type : Could not load file or assembly 'file:///C:\users\v-something\desktop\Backup-MsftBlog\HtmlAgilityPack.dll' or
one of its dependencies. Operation is not supported. (Exception from HRESULT: 0x80131515)
At C:\users\v-something\desktop\Backup-MsftBlog\Backup-MsftBlog.ps1:25 char:1
+ Add-Type -Path HtmlAgilityPack.dll "

 

Any idea how to circumvent this?

Vadims Podāns
Vadims Podāns 02.04.2019 15:51 (GMT+2) How to save disappearing MSDN and TechNet blogs with PowerShell

@Randolph, make sure if you have unblocked the .dll. If you download the file from internet, an alternate stream is added to indicate the file source. Right-click on dll and script file and press "Unblock" if such option is visible.

Randolph
Randolph 02.04.2019 19:44 (GMT+2) How to save disappearing MSDN and TechNet blogs with PowerShell

Thank you! I had changed the Execution Policy for the script but had not unblocked the .dll.

Иван
Иван 05.04.2019 14:47 (GMT+2) How to save disappearing MSDN and TechNet blogs with PowerShell

Вот поэтому и лучше свое творение держать на своем домене и хостинге, и никакие нововвдения вам не страшны. Главное делать бэкапы


Post your comment:

Please, solve this little equation and enter result below. Captcha