<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:series="http://unfoldingneurons.com/"
	>

<channel>
	<title>Stephen Foskett, Pack Rat &#187; XML Archives  &#8211; Stephen Foskett, Pack Rat</title>
	<atom:link href="http://blog.fosketts.net/tag/xml/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.fosketts.net</link>
	<description>Understanding the accumulation of data</description>
	<lastBuildDate>Fri, 10 Feb 2012 17:40:43 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=</generator>
<atom:link rel="hub" href="http://pubsubhubbub.appspot.com" />
	<atom:link rel="hub" href="http://superfeedr.com/hubbub" />
			<item>
		<title>Cool Google Spreadsheet XML/XPath Mojo</title>
		<link>http://blog.fosketts.net/2010/07/02/cool-google-spreadsheet-importxml-xpath/</link>
		<comments>http://blog.fosketts.net/2010/07/02/cool-google-spreadsheet-importxml-xpath/#comments</comments>
		<pubDate>Fri, 02 Jul 2010 20:41:12 +0000</pubDate>
		<dc:creator>Stephen</dc:creator>
				<category><![CDATA[Everything]]></category>
		<category><![CDATA[Personal]]></category>
		<category><![CDATA[Alexa]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[Google Apps]]></category>
		<category><![CDATA[Google Spreadsheet]]></category>
		<category><![CDATA[HTML]]></category>
		<category><![CDATA[importXML]]></category>
		<category><![CDATA[Klout]]></category>
		<category><![CDATA[LinkedIn]]></category>
		<category><![CDATA[social media]]></category>
		<category><![CDATA[Twitter]]></category>
		<category><![CDATA[XML]]></category>
		<category><![CDATA[XPath]]></category>

		<guid isPermaLink="false">http://blog.fosketts.net/?p=3320</guid>
		<description><![CDATA[Google Spreadsheet sure isn't as responsive for a power user like me, but I love the ability to share information with others and cooperatively edit a workbook. It's become our main tool for planning the Gestalt IT Tech Field Day events. I was thrilled to discover that Google's spreadsheet supports the importXML tag, which allows it to automatically gather information from other web sites. Let's take a look at how it works!]]></description>
			<content:encoded><![CDATA[<p>Google Spreadsheet sure isn&#8217;t as responsive for a power user like me, but I love the ability to share information with others and cooperatively edit a workbook. It&#8217;s become our main tool for planning the <a href="http://gestaltit.com/field-day/"  target="_blank">Gestalt IT Tech Field Day</a> events. I was thrilled to discover that Google&#8217;s spreadsheet supports the <a rel="nofollow" href="http://docs.google.com/support/bin/answer.py?hl=en&amp;answer=75507"  target="_blank">importXML</a> tag, which allows it to automatically gather information from other web sites. Let&#8217;s take a look at how it works!</p>
<h3>HTML == XML?</h3>
<p>Most web users have heard of HTML, but XML is a much geekier thing. Like HTML, XML is a way of &#8220;marking up&#8221; a text document to provide hidden clues about the content and how to display it. In fact, modern HTML is (sort of) a subset or category of XML, and HTML documents (the kind you access with your web browser) can be accessed by many XML tools.</p>
<p>Both HTML and XML enclose text in tags. For example, a paragraph would be surrounded by &lt;p&gt; and &lt;/p&gt; tags, while a table starts and ends with &lt;table&gt; and &lt;/table&gt;. These can be nested within each other, and are commonly deeply nested indeed. Web and XML documents typically have many layers of &lt;div&gt; and &lt;span&gt; tags, for example, and include lists within lists as well.</p>
<p>This somewhat-organized mess creates a &#8220;path&#8221; through the document leading to pieces of information. This concept is called <a href="http://www.w3.org/TR/xpath/"  target="_blank">XPath</a>. For example, a twitter profile page embeds your name deep in the &lt;html&gt;&lt;body&gt; path under &lt;div id=&#8221;side&#8221;&gt;, &lt;div id=&#8221;profile&#8221; class=&#8221;section profile-side&#8221;&gt;, &lt;address&gt;, &lt;ul class=&#8221;about vcard entry-author&#8221;&gt;, &lt;li&gt;, and &lt;span class=&#8221;fn&#8221;&gt;.</p>
<p>Raw HTML is often organized like this, and this can work to our advantage. If a computer program wanted to pull the full name of a Twitter user out of their profile page, it could look for a &lt;span&gt; element with the class property set to &#8220;fn&#8221;. It could also look for the &lt;address&gt; tag and pull out some information about the user in the <a href="http://microformats.org/wiki/rfc-2426"  target="_blank">vcard microformat</a>.</p>
<h3>importXML</h3>
<p>All this becomes very interesting indeed when considering the importXML function in Google Spreadsheet. It will parse a URL as an XML file automatically, and you can tell it to look in an XPath for data. This is very powerful indeed!</p>
<h4>Twitter Followers</h4>
<p>I think an example will make it clearer. Let&#8217;s look up the Twitter follower count of a user. We create a spreadsheet and enter a twitter username in cell A1. Then we put the following formula in cells B1 and C1 and the count magically appears!</p>
<p>B1:<br />
<code><br />
=if(C1&lt;&gt;"",right(C1,len(C1)-10),"")<br />
</code></p>
<p>C1:<br />
<code><br />
=if(A1&lt;&gt;"",importXML("http://mobile.twitter.com/"&amp;A1,"//a[@ href='http://mobile.twitter.com/"&amp;lower(A1)&amp;"/followers']"),0)<br />
</code></p>
<p>This formula looks for a &lt;a href&gt; with the URL including the user and twitter followers, which is a unique HTML element in every mobile Twitter profile. This returns a string like &#8220;Followers:1234&#8243;, so we use another formula to strip that part out.</p>
<p><em>Updated 1/22/12 after Twitter screwed up the main page. Good thing the mobile site still works!</em></p>
<h4>LinkedIn Connections</h4>
<p>Let&#8217;s try something else. How would we pull the number of connections a person has in LinkedIn using importXML? Here&#8217;s a function!</p>
<p><code>=value(substitute(importXML(A3,"//dd[@class='overview-connections']/p/strong"),"500+","500"))</code></p>
<p>This is a little more complicated. We&#8217;re doing the same thing, taking a url from cell A3 that corresponds to the person&#8217;s public LinkedIn page, and outputting the content of the &lt;dd class=&#8221;overview-connections&#8221;&gt; tag. But we&#8217;re also using the SUBSTITUTE() function to take the plus sign off a &#8220;500+&#8221; response and converting it into a value for calculation.</p>
<p><em>Updated 1/22/12 for new LinkedIn Format</em></p>
<h4>Alexa and Klout</h4>
<p>Here are a few more examples. I bet you can follow along now.</p>
<p>Alexa traffic rank:</p>
<p><code>=value(importXML("http://www.alexa.com/search?q="&amp;E3,"//div[@class='row']/span/a[@href][1]"))</code></p>
<p>Klout score:</p>
<p><code>=value(substitute(importXML("http://klout.com/"&amp;C3,"//span[@class='value']"),"klout score",""))</code></p>
<h3>Limitations in Google Spreadsheets</h3>
<p>Before you go thinking you can run off and create awesome web applications like this, know that there are some serious limitations. First, Google limits the use of importXML to 50 per workbook. This means you can&#8217;t import from hundreds of sources in the same spreadsheet, or even in multiple sheets in the same workbook. Next, importXML is pretty opaque in everyday use. You have to do a lot of trial and error to get the XPath right, and it fails often with #N/A, breaking calculations.</p>
<p>But it&#8217;s still pretty useful when creating a spreadsheet that needs to pull in information from outside sources. You can grab all sorts of data this way, from current stock quotes to weather or sports metrics. You have the whole Internet at your disposal &#8211; let&#8217;s get creative!</p>
<div id="crp_related"><h3>You might also want to read these other posts...</h3><ul><li><a href="http://blog.fosketts.net/2010/01/20/vendor-twitter/"  rel="bookmark" class="crp_title">Vendor Non-Blogs</a></li><li><a href="http://blog.fosketts.net/2009/01/15/googles-analytics-measuring-page-seo/"  rel="bookmark" class="crp_title">Measuring the Importance of Google&#8217;s First Page</a></li><li><a href="http://blog.fosketts.net/2012/02/09/twitter-zen-tips-newbies/"  rel="bookmark" class="crp_title">Twitter Zen: My Tips For Newbies</a></li><li><a href="http://blog.fosketts.net/2009/05/22/google-nofollow/"  rel="bookmark" class="crp_title">Google Is Heading For A Cliff; What Will They Do?</a></li><li><a href="http://blog.fosketts.net/2009/05/13/twitter-loses-control-twitter/"  rel="bookmark" class="crp_title">Twitter Loses Control Of Twitter</a></li></ul></div><script src="http://feeds.feedburner.com/~s/sfoskett?i=http://blog.fosketts.net/2010/07/02/cool-google-spreadsheet-importxml-xpath/" type="text/javascript" charset="utf-8"></script><hr />
<p><small>© sfoskett for <a href="http://blog.fosketts.net">Stephen Foskett, Pack Rat</a>, 2010. |
<a href="http://blog.fosketts.net/2010/07/02/cool-google-spreadsheet-importxml-xpath/">Cool Google Spreadsheet XML/XPath Mojo</a>
<br/>
This post was categorized as <a href="http://blog.fosketts.net/category/everything/" title="View all posts in Everything" rel="category tag">Everything</a>, <a href="http://blog.fosketts.net/category/everything/personal/" title="View all posts in Personal" rel="category tag">Personal</a>. Each of my categories has its own feed if you'd like to filter out or focus on posts like this.<br/>
</small></p>]]></content:encoded>
			<wfw:commentRss>http://blog.fosketts.net/2010/07/02/cool-google-spreadsheet-importxml-xpath/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Long-Term Versus Longer-Term Archiving</title>
		<link>http://blog.fosketts.net/2008/12/02/long-term-archiving/</link>
		<comments>http://blog.fosketts.net/2008/12/02/long-term-archiving/#comments</comments>
		<pubDate>Tue, 02 Dec 2008 14:46:03 +0000</pubDate>
		<dc:creator>Stephen</dc:creator>
				<category><![CDATA[Enterprise storage]]></category>
		<category><![CDATA[Personal]]></category>
		<category><![CDATA[AIIM]]></category>
		<category><![CDATA[archiving]]></category>
		<category><![CDATA[ASCII]]></category>
		<category><![CDATA[data archive]]></category>
		<category><![CDATA[disk]]></category>
		<category><![CDATA[media]]></category>
		<category><![CDATA[paper]]></category>
		<category><![CDATA[papyrus]]></category>
		<category><![CDATA[PDF]]></category>
		<category><![CDATA[record retention]]></category>
		<category><![CDATA[records]]></category>
		<category><![CDATA[tablet]]></category>
		<category><![CDATA[tape]]></category>
		<category><![CDATA[toot toot]]></category>
		<category><![CDATA[webinar]]></category>
		<category><![CDATA[XML]]></category>

		<guid isPermaLink="false">http://blog.fosketts.net/?p=1163</guid>
		<description><![CDATA[How will you retain records for the long haul? It depends on how you define &#8220;long&#8221;. Nearly everyone (individual and business alike) has certain records to retain for years, and some may need retention for decades or centuries. How can you accomplish this? First, consider whether to store records as atoms or bits. You can [...]]]></description>
			<content:encoded><![CDATA[<p>How will you retain records for the long haul? It depends on how you define &#8220;long&#8221;. Nearly everyone (individual and business alike) has certain records to retain for years, and some may need retention for decades or centuries. How can you accomplish this?</p>
<p>First, consider whether to store records as atoms or bits. You can convert paper to data or vice versa, and there are pros and cons to both:</p>
<ul>
<li>Properly handled physical (paper or film) records should last for hundreds of years and can remain readable without software or devices. But they&#8217;re hard to search (you need an index), and paper is bulky, heavy, and difficult to work with.</li>
<li>Digital records can either be stored offline or kept &#8220;alive,&#8221; but questions remain about their long-term reliability and readability. Living records can be easy to search and use, and digital storage can be very space-efficient, but data tends to pile up &#8220;out of sight.&#8221;</li>
</ul>
<p>Long-term storage of records on physical media is proven &#8211; think about papyrus, tablets, gold or nickel discs, film, and paper. But will digital media fare as well? Data tapes and disks can degrade over time, and manufacturer reliability specs are based on accelerated testing, not actual experience. Regardless of media type, careful handling can extend media life.</p>
<p>But will you still be able to read it? Tapes and optical disks require additional hardware to read, while disk drives are paired with their read heads. Software applications are needed to read and interpret data (backup, archiving, compression, encryption, deduplication, database) as well. What about content format? Should you use ASCII, XML, PDF/A?</p>
<ul>
</ul>
<p>I&#8217;ll be presenting a webinar on this topic tomorrow, Wednesday, December 3, at 2:00 PM Eastern time. <a href="http://www.aiim.org/Events/register.aspx?id=288"  target="_blank">Register on-line</a> at the AIIM web site and join me for the discussion!</p>
<div id="crp_related"><h3>You might also want to read these other posts...</h3><ul><li><a href="http://blog.fosketts.net/2008/12/03/thoughts-longterm-archiving/"  rel="bookmark" class="crp_title">Thoughts on Long-Term Archiving</a></li><li><a href="http://blog.fosketts.net/2008/10/08/automate-policy-email-archiving-2/"  rel="bookmark" class="crp_title">Webcast: Automating Policy With Email Archiving Technology</a></li><li><a href="http://blog.fosketts.net/2008/02/07/how-long-should-companies-retain-email/"  rel="bookmark" class="crp_title">How Long Should Companies Retain Email?</a></li><li><a href="http://blog.fosketts.net/2008/12/04/enhanced-archive-platforms-netapp/"  rel="bookmark" class="crp_title">White Paper: Enhanced Archive Platforms with Agility for NetApp</a></li><li><a href="http://blog.fosketts.net/2008/10/20/managing-email-e-discovery/"  rel="bookmark" class="crp_title">Six Critical Steps For Managing Email E-Discovery</a></li></ul></div><script src="http://feeds.feedburner.com/~s/sfoskett?i=http://blog.fosketts.net/2008/12/02/long-term-archiving/" type="text/javascript" charset="utf-8"></script><hr />
<p><small>© sfoskett for <a href="http://blog.fosketts.net">Stephen Foskett, Pack Rat</a>, 2008. |
<a href="http://blog.fosketts.net/2008/12/02/long-term-archiving/">Long-Term Versus Longer-Term Archiving</a>
<br/>
This post was categorized as <a href="http://blog.fosketts.net/category/everything/enterprisestorage/" title="View all posts in Enterprise storage" rel="category tag">Enterprise storage</a>, <a href="http://blog.fosketts.net/category/everything/personal/" title="View all posts in Personal" rel="category tag">Personal</a>. Each of my categories has its own feed if you'd like to filter out or focus on posts like this.<br/>
</small></p>]]></content:encoded>
			<wfw:commentRss>http://blog.fosketts.net/2008/12/02/long-term-archiving/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>

