Software Identification Problems Continue – 20 years of Challenges Will Persist Into the ISO 19770-2 Tagging Era

By Kris Barker, Express Metrix

Since the dawn of the desktop era, IT departments have struggled with keeping track of software installed across corporate networks. An accurate software inventory is crucial to properly licensing installed applications, understanding whether they’re being used, and budgeting for future purchases. But there is no standard methodology across applications and manufacturers for correlating installed program executables with actual application titles. This situation leaves asset managers and the software discovery tools they utilize with any number of half-complete approaches to software identification.

Driven by licensing challenges stemming from inaccurate and incomplete software identification, the new ISO/IEC 19770-2 software tagging standard has been developed, equipping publishers with guidelines for “tagging” their applications in a standard way that makes identification straightforward, automated, and nearly foolproof for discovery tools. Despite the technical ease with which software tags can be implemented, large publishers have been painfully slow to embrace the standard, and end users have not pressed vendors hard enough to spur them to action.

This glacial pace of adoption, coupled with the reality that virtually none of the applications in use on desktops today utilize software tags means that the practical benefits of tagging will be years in the offing.

A few large publishers like Adobe and Symantec are beginning to tag new releases according to the ISO standard, and some branches of the U.S. federal government such as the DoD and GSA are moving toward including tagging in their software procurement requirements. But the reality for asset managers is that a completely accurate survey of software installations based on tag identification alone will be essentially useless until most, if not all, newly-released software is labeled according to the new standard, and until every “untagged” copy of software is retired from the desktop.

Until that time, many software inventory tools will continue to rely on less accurate methods such as analyzing file headers, registry entries, or installer databases. Unfortunately, such approaches nearly always come up short by over-counting or under-counting installed applications (and in some cases, missing them altogether); presenting data that is not consistent across applications, versions, editions, and publishers; and poorly correlating discovered file data to licensed application titles.

The difficulty for most tools in properly identifying application titles means that end users are often saddled with the task of manually interpreting, validating, and normalizing a significant amount of the discovered raw data—a time-consuming and error-prone process that, if neglected or performed improperly, can come back to bite later in a number of ways—among them, the dreaded software audit.

While we wait for tagged software to become ubiquitous on the desktop, we must be mindful of the shortcomings of the existing methods of identification. At the very least, understanding which method(s) are utilized by one’s own discovery tool is key to understanding where to best focus efforts to interpret presented data.

File header analysis, a methodology used by many software inventory products, is directly tied to the application executable; however, this information is often inconsistent, incomplete, or outdated because publishers are under no obligation to ensure the data is correct. In addition, multiple applications may share the same executable file(s), leading to confusion about which application is indicated by a given file. Perhaps worst of all, because many applications consist of multiple (sometimes hundreds or even thousands in the case of the Windows OS) executable files; looking at individual file headers won’t necessarily reveal the relationship between the executables and the licensed product with which they correspond.

The Windows Installer Database (MSI) or, more likely, the subset of MSI data stored in the Windows Registry (visible from within Add/Remove Programs) is another common source of information revealing what’s installed on a desktop. But programs installed using methods other than the Windows installer often go undetected. Further, the data available from these sources often lacks sufficient version granularity or can’t be correlated 1-to-1 with licensable application titles.

Some asset management technology vendors, like my own company, have collected software inventory data from corporate networks over many years and built proprietary catalogs that enable correlation of discovered executables and other application data with their licensable application titles. Because of the lack of a standard method for collecting and interpreting this information, it is incumbent on the curators of these catalogs to continually add new content, manually validate the accuracy of new entries, and normalize the data for practical use. It therefore stands to reason that the utility of the software catalog is only as good as the curator’s commitment to update and maintain it.

Regardless of the process used to identify software—and often it is some combination of the above—the real value in any method is in presenting normalized, accurate data to the end user in a way that allows them to effectively monitor their license positions.

Unfortunately, we are still a long ways away from a standard approach to software recognition. Ironically, many of the same software publishers who take a hard line with respect to license compliance have neither tagged, nor announced plans to tag, their software according to the ISO standard. Other publishers have tagged their software, but are using their own proprietary syntax, a vendor-centric approach that further frustrates attempts to establish a universal standard that’s friendly both to end-users and the technologies meant to support them.

Even if publishers were to adopt the ISO standard in earnest—and there is little sign of that—it will take years for the untagged applications installed throughout the typical enterprise to be retired and/or replaced with updated, and presumably tagged, versions.

Unless software vendors demonstrate they are making legitimate progress on tagging, they should lighten up on customers that can prove they are making a good-faith effort with respect to license compliance.

Meanwhile, those responsible for software compliance and licensing decisions must educate themselves about the various software identification methodologies, as well as their related limitations, to minimize their risks while waiting for the promise of software tagging to be fulfilled.