File Path Parsing

Posted under tips on April 24 , 2019 by Swanand Nirgudkar


Extracting relevant information from a file path may seem an obvious task on the surface but achieving it in a single step can easily get tedious.

Though there are multiple ways of parsing a path in SheetKraft, a fairly easy solution using Regex is bound to achieve the same in most cases. For example let's consider a file path C:\Users\New User\Desktop\Example.txt. Using just Excel one often has to use a formula like

MID(A1,FIND("*",SUBSTITUTE(A1,"\","*",LEN(A1)-LEN(SUBSTITUTE(A1,"\",""))))+1,LEN(A1))

Where A1 is the address of the reference cell. Moreover one further has to use functions like LEFT and LEN to get the parent directory. On the contrary, if we use SheetKraft the same can be reduced to

RegexSearch.SK(A1,"(.*)\\(.*)","$2")

In this formula, the second argument is a Regex containing two capturing groups "(.*)" set apart by a separator "\", where the additional "\" is used for escaping. The third argument specifies that we are using the second capture group. Hence, simply replacing the $2 with $1 will fetch us the parent diectory. Also in case the separator is a forward slash "/" as in case of urls, the desired result can be achieved by substituting "\\" in the formula with "/".

Notes:

  • It would be better to use a Regex like (.*)\\([^\\]+)$. Here we are saying that the last group should not contain a backslash, must have at least one character and the string should end after it. The former Regex does work but relies on the Regex engine being more greedy on the first group than the second which is a dubious feature.
  • Multiple capture groups can be obtained out of RegexSearch by passing an array in the third argument. For instance, {"$1","$2"} will produce an array output containing the parent directory path and filename.
  • Any Regex with more than one .* can lead to an ambiguity about which one is more greedy. For example a\b\c can be considered as a\b followed by c or a followed by b\c. Even if there are definite rules regarding which one should the Regex engine prefer, performance with multiple greedy parts in a Regex will be poorer.


TAGGED:SheetKraftRegexFileCapture Group


Let's talk

Address

7th floor, Unit No. 715, C Wing,
Kailas Business Park, S. Veer Savarkar Marg,
Park Site, Vikhroli (W), Mumbai-400079.

 

For General Queries & Technical Support

contactus@quantumphinance.com

 

For Sale and Demos

sales@quantumphinance.com